Wednesday, December 31, 2008

Hits the Spot!

I am absolutely not going to this dinner posted below ($75 person, yeah, uh huh), but it looks absolutely delicious to me. A Hawaiian/Creole fusion. Yum, yum. I got this because I am on a jazz performances email listing in Hawaii (I've only been to one) and one of Oahu's most famous trumpeters, DeShannon Higa, is playing with crew at this event. But I looooove the menu.

I think I would go with the...

Sweet potato gumbo
Spicy ahi poke stack (chunks of raw tuna (ahi; coincidentally the same word in Japanese and Hawaiian, actually coincidentally); it's the thing in the picture)
Caesar salad
Shrimp, Andouille, and Hawaiian Snapper Etoufee
Hawaiian-style Beignets (these are more commonly called malasadas and are indeed similar to beignets, except they are Portuguese descended; Portugal was the major non-Asian-Pacific people who came to work the plantations, so there's some lingering cultural influence).

Or maybe I'm having Louisiana food withdrawal.

Monday, December 29, 2008

Headed Down Under, not Down South

Well, good news and bad news. On the bad news front, I was in a "sound like Elvis" karaoke contest. My version of Teddy bear was good enough to get me to round 2, but apparently my Can't Help Falling in Love with You was insufficient to make round 3. Sigh. No Graceland for me!

In other news, I submitted an abstract to present some of the work on Korean apologies at the International Conference of Pragmatics, or some similar title, and heard this morning that I and co-author were accepted to give a presentation. Therefore, if I can scrounge up the money, I am now headed to Melbourne Australia for a week in July. Never been to Australia. I think Indonesia is a close as I've gotten. So that's very cool. Not quite as cool as Graceland, but van Diemen's Land* will have to do.

So if anyone knows super cheap hotels in Melbourne....

*ok, not really going to Tasmania, but I needed a land for the style of the sentence and it's close enough for us Yanks.

Sunday, December 28, 2008

Textbook - Section 6

Hello.

This is the final section of the textbook chapter. There's Further Readings and Homework and Discussion sections, but I doubt any of you are that serious.

From the Language Computer Back to the Language Human

This chapter has focused so far on how we as humans use computers to talk to each other and we’ve seen that the more we ask of computers the more like us they have to become. On an everyday basis, what this means for language engineers is that they often spend time reading all about humans and language. All the areas discussed in other chapters of this book, grammar, phonetics, psychology of language, society and language, etc., can be evidence for how a computer can be created to do what we do. However, researchers in these human-focused areas of language study have also been looking increasingly to computers for insight as well. Just as we might figure out how to create a computer by studying people, seeing how a computer does something can help us figure out how the person does it. There are two main components to this: Corpus Linguistics and computational linguistics, particularly computational modeling.

We’ve already run into corpora (corpora is the plural of corpus) in this chapter. A corpus, again, is a body of language data. Technically a corpus can be any size. Our one sentence text message was a corpus and we studied it to find patterns of language use in text messaging. A larger corpus filled with tens of thousands of text messages would allow for much larger questions to be asked, such as: how common are consonant-only abbreviations? How about the smileys? Are the text messages from women distinct from those of men? Is text messaging ever used to speak to someone of higher status, or is it only to people of same status and lower? Software tools can help us scan millions of messages, which would be infeasible to do by hand.

Large corpora have often revealed facts about our language that were not realized before. For instance, there have been claims that children never hear certain types of grammatical patterns, and so there is no way for a child to learn them. However, analysis of large databases of real language reveals just those grammatical patterns. Corpora can open up a world of language details that we’ve never had access to in the past. You can try out some basic analysis on one of the world’s largest corpora today if you wish, namely Google. Want to find out if two words ever occur together in speech? Enter them into Google and see.

Linguists are also using computational models more and more in their study of language use. One reason for this is imminently practical. The strict formalism we’ve been discussing in earlier sections requires a linguist to really find out if they know what they’re talking about. To put it another way, we might think we have a very clear idea of what a grammatical subject is. Then we try to tell an unthinking computer how to find the subject and discover there are a lot of details we forgot about. As another example, let’s say we have a hypothesis about speaking that involves 1) coming up with some sort of meaning to express, 2) putting that meaning into the right grammatical structure, and then 3) finding the words to fill out that structure. It may seem very clear to us as we sit in our armchairs thinking of it. But when you go to teach a computer that clear idea, you discover that your ideas on how grammar relates to words were fuzzy or that your notion of the meaning of verbs is incompatible with your notion of the meaning of sentences.

Beyond using the computer’s formalism to straighten our own thinking out, computational models can sometimes predict what humans would do in a similar situation. Language is an enormously complicated system and it is often difficult to see how tweaking one bit here will relate to the entire system. If a psychologically plausible model can be created for some small area of language use, we can run simulations of our linguistic psychology right on the computer.

This spiral of human to computer to human again is perhaps best seen with connectionist models of language. Connectionist models, also called neural networks, are computational models based upon some properties of the human brain. In these networks, neuron-like models become activated based upon the data they encounter and then “wire” together to learn the patterns of the data. Such models can of course be trained with language data and then make predictions about how humans would react to the same language. One example is the study of the effects of brain lesions on language comprehension and production. It is unethical to purposefully destroy part of a real person’s brain to see what effects damage to that part of the brain (called a brain lesion) would have on language. However, one can simulate brain lesions on neural networks without going to jail and therefore safely test hypotheses.

In all realms, we reveal who we are by what we do. One might also say that we reveal who we are by what we create. As we attempt to create more sophisticated tools to accomplish tasks that humans alone, as far as we know, can accomplish, we have to put more and more of ourselves into the tool. With computers, particularly in the realm of language, this is not just an abstract idea, but an accurate description of what is happening in language labs and industry offices all around the world. We study ourselves to understand computers and we study computers to understand ourselves.

Monday, December 22, 2008

Textbook - Section 5

We're almost there! Only one more section after this. Believe it or not, the whole chapter's only 6,000 words.

Anyway, here we finish up some of the issues to handle in getting machine translation to work, and then move into the next section, which introduces the very, very basics of natural language processing (of sorts) which is just the term for getting computers to be able to handle natural language, such as English.

And I just realized that FairyHedgeHog stated in the comments that I was suggesting that there's no Babel Fish. I had assumed she'd read my little section on Babel Fish when saying this, but I now realize that that section wasn't posted yet. But it is now about 3-4 paragraphs down. Off we go:

...


There are further problems with word-to-word translation: Many words in one language simply do not exist in another language. This might occur because a word’s meaning is nuanced and complex. Examples of this in English might be smarmy or punk’d. Translators enjoy making lists of such hard to translate words. An example from one recent such list is mamihlapinatapei, a word in Yagan. Apparently, it means something like, “implying a wordless yet meaningful look shared by two people who both desire to initiate something but are both reluctant to start.” That’s quite a complex idea, and it is not surprising that many languages of the world do not have a word for it. Other difficult to translate words seem fairly clear in meaning, there just happens not to be a single word for them in English. An example might be iktsuarpok in Inuit, which means “to go outside to check if anyone is coming.” Or perhaps cafuné in Brazilian Portuguese, meaning, “to tenderly run one’s fingers through someone’s hair.” Both of these words appear to be rather useful words and it might be nice if English had them. But it doesn’t.

In such cases, what we need is not a mapping from one dictionary to another, but simply someone to understand the meaning of the difficult word and tell us in English. However, this is asking a great deal indeed from a computer. The ideal computer translator needs to know not just the list of words in each language and how they correspond to one another but also the grammar of all languages we are interested in, the pragmatics of each society’s relationships, and the very meaning of what is being said. In fact, we are getting very close to asking the computer to be human, to know what we know about the world.

How to Get a Computer to Talk to Us

The reader who has just finished the section on how difficult translation is might be thinking, “but don’t we have some machine translation already?” Indeed, we do. Perhaps the best well-known is Babel Fish as found on Alta Vista and Yahoo!. Babel Fish is a free machine translation service offered on the Web, and it has an extremely tough task. The user can copy in any text she chooses into its screen in a series of languages, and ask for a translation to another language. It’s critical to hammer on the point that anything could be put into its text window. Therefore, to expect such a site to always do spectacularly is indeed like expecting it to be almost human, and it’s clearly not.

Typing a sentence into Yahoo! Babel Fish , translating it to another language, and then translating it back reveals this. If you take the English sentence the boy kicked the ball, translate it to Chinese and then translate it back, you get the boy played with the ball. This is similar in meaning but not the same, as the action of kicking has been lost. Going to Japanese and back gives us the boy kicked sphere. This is the right meaning, just never something an English speaker would say. However, Babel Fish completely jumps the shark when you go to Korean and back, returning the boy l where kicks hard public affairs.

Machine translation can do much better, however, if you give it a far simpler task. Poor Babel Fish has to handle anything anyone could ever say in its languages, like a human does. If you simplify the problem by only handling certain types of communication, the computer can become truly helpful. Examples might be: just focusing on scheduling meetings between people or just translating technical reports on aeronautics. Domain Analysis is the process of formally defining a precise problem to be concerned with. The idea is straightforward. You focus on a tiny part of the whole world, defining exact criteria for what is within the domain you will handle. If you do this, you have a shot at actually giving the computer much of the knowledge it will need to perform reliable translation.

Let’s say, for instance, that you want the computer to be able to handle language about bill payments. Bill payment is such a restricted domain that a team of linguists can teach the computer about how the world of bill payments works. Part of this process is providing the computer with an Ontology. In language engineering, an ontology is a formal structured list of the things that exist in that domain. In the domain of bill payment, that would include things such as bank accounts, bills, due dates, transaction dates, amounts, currency, and so on. The ontologist defines critical concepts in the domain, often called classes. These classes in turn will have various features with restrictions on those features. For instance, a credit card payment could be a class in the ontology with a feature such as the payment amount.
If we were to build an ontology of the Child Play domain, it would likely prevent many mis-translations, such as we saw in the English – Korean – English example above. Virtually no ontology of child play would be concerned with public affairs and so any ambiguities in words, which lead Yahoo! Babel Fish to wander into public affairs land, would be ruled out.

These formal statements concerning language are needed across the board to handle deep language translation – for word recognition, for language meaning, for grammar, etc. For instance, we discussed the need to distinguish subject from object when translating. Among other reasons, this was needed so that the verb could be made to agree with the subject (but often not the object) and so that one could re-order words depending on the language (the object is after the verb in English but before it in Japanese, German, or Korean.) How could you tell a computer how to find the subject in an English sentence?

One initial approach might revolve around defining noun phrases, and then telling the computer where in a sentence the subject noun phrase is located. First, look up each word in the sentence and find which words are nouns. Then, instruct the computer that nouns are often part of noun phrases. Let’s take the boy kicked the ball again as our example. If the computer looks up each word, they will find at least boy and ball are listed as nouns. Kick will be listed as well, as in he gave the ball a swift kick, but in this case, kick has the –ed ending marking it as a verb, not a noun. Next, the computer has a grammatical rule telling it that nouns often occur with determiners (articles) such as the and a. And so it pairs the boy and the ball into noun phrases. Finally, the computer has to decide which of these noun phrases to agree with. The computer’s programmer might put in a rule saying that the noun phrase before the verb in English is most likely to be the subject, while the noun phrase after the verb is most likely to be the object.

This particular rule would fail rather quickly, but the necessity of specifying precise patterns should be clear. The more restricted the domain that the computer must cope with the more superficial the computer’s idea of grammar and meaning can be. Contemporary computers are quite successful in dealing with human language in highly restricted domains. As the domain grows, however, to be more and more like the everyday world humans live in, the more human-like the computer becomes as well. Such a machine translator approximates being an android instead of a desktop computer. To talk to a computer, to really talk to it like we talk to each other, the computer might need to be a silicone version of ourselves. JAKE NOTE: I HAD AN ENTIRE SPEECH RECOGNITION SECTION HERE THAT I THOUGHT WOULD BE COOL, BUT SPACE IS ALMOST GONE, SO I’M KILLING IT. LET ME KNOW IF YOU WANT TO PUT IT BACK IN (WHICH MEANS TELL ME IF YOU WANT ME TO WRITE IT).

Merry Xmas

Hi all, I did The Twelve Days of Christmas in Hawaii on my karaoke site.

Number 1 day of christmas my tutu give to me, one mynah bird in one papaya tree, etc. My voice is non-functional right now due to a cold, but I hope that's okay on this tune, since it's mostly humorous. On the positive side, I can hit lower notes than normal.

http://www.singsnap.com/snap/r/bcf0b82c

Saturday, December 20, 2008

Textbook - Section 4

Howdy,

As we continue by flying tour of language and computers, we move from talking to other people in the same language through computers to talking to people in different languages on computers, i.e., we are moving into machine translation. This is going to start getting increasingly tech-y, but it never gets hardcore at all. Anyway, here's the next bit. Unfortunately, there's a lot of formatting that's getting lost here. I've tried to put most of it back.

My Trusty Speech-o-Matic!

We’ve talked so far about how the language we use online can be a window into our own subconscious knowledge of our language and how the communicative abilities of the Internet have increased language opportunities. One of the greatest dreams for computers is that they would somehow allow us to communicate across languages. The dream version might be the supposed Universal Translator as seen in the Star Trek world, or it’s companion in a million other movies. Anyone in the universe walks up, speaks through whatever means their alien bodies allow, and magically the computer translator pops everything out as Mainstream American English. Brilliant! Could it ever be possible? How would it work?

Perhaps the best way to start looking at Machine Translation is by looking at what it is not. Translation is not substituting the words of one language for the words of another language. This may work for some sentences, but it quickly breaks down. Let’s take the English sentence Sylvia is going to the library and attempt to translate it into French. There are parallel words in French for most of this sentence. English to go is similar to French aller; English to is similar to French à; English the is similar to French la; and English library is similar to the French bibliotèque. So let’s take just substitute one word for another. Sylvia goes to the library becomes Sylvia va à la bibliotèque.

That turned out fairly well, but I’ve already fudged things a bit. You might have noticed I said that to go is similar to aller, but then I used va in the sentence. This is because, French, like English, has different verb forms based upon tense, aspect, and mood. JAKENOTE, DO THEY KNOW ALL OF THESE TERMS FROM EARLIER CHAPTERS? There are many parallels between French and English verb forms, but they do not always match. French verbs, for instance, mark whether the subject of the sentence is first person, second person, or third person right on the verb, while English does not by and large. Therefore, to translate any verb into French, we need to know what the subject is. Since this is not part of the English verb, we cannot read it from the word itself. Instead the machine translator must know what the subject is. This, in turn, requires a grammatical analysis of the sentence.

A similar problem occurs with translations between English and German. English generally follows a sentence pattern of subject-verb-object, while German follows a pattern of subject-object-verb, i.e., the verb goes in the middle for English sentences but at the end for German sentences. JAKE NOTE, A POINTER SOMEWHERE IN THIS PARAGRAPH TO YOUR TYPOLOGY CHAPTER SOUNDS APPROPRIATE. In such a case, even if there was a perfect one to one match between English words and German words, the translator again cannot simply substitute one word for another. It needs to know what the subject is, what the object is, and what the verb is in order to re-arrange things. In short, any non-trivial translation requires that the computer be able to grammatically analyze any sentence it encounters. Even this might seem simpler than it is. Most of the sentences that you have read in this chapter have never been written before, and never encountered by you, the reader. (Some of you may wish you never encounter sentences like this again.) These exact sentences will not be in any database of the English language. Instead, you the reader, and the computer translator, must know English grammar sufficiently to analyze sentences you’ve never heard.

Guess what? The problem gets worse. Let’s say we want to translate between English and either Japanese or Korean. Both of those languages express what are called honorifics right in their grammar and verb forms. Honorifics are sort of a grammatical form of politeness and their use depends upon a series of factors such as the status of the person being spoken to, the status of the person speaking, the intimacy of their relationship, and more. English deals with many of these same issues. We do not speak the same way when giving a talk at a funeral as we do while playing Guitar Hero III with a friend. Nor do we talk the same way to our lovers as we do to our company’s CEO -- if you want to keep your job. These social factors are expressed through the word choices we make, the elaborateness of the phrasing, and the assertiveness with which we speak. JAKE NOTE; AGAIN, CAN WE REFERENCE THE SOCIO CHAPTERS FOR THIS? Japanese and Korean use these methods to express social relationships as well, but they additionally code some of this in the exact verb endings used in a sentence. This implies that the computer cannot perform a true translation of Sylvia went to the library into Korean or Japanese without some pragmatic knowledge of each society – who is of high status, who is of low, and what their relationships are like. Such knowledge is, of course, nowhere in the sentence itself, but requires knowing the entire social context of the sentence.

Friday, December 19, 2008

More karaoke!

I did another tune! This time I'm pretending to be the old Bingle, Bing Crosby, singing Swinging on a Star. I have about his range naturally, so I always like singing tunes he made famous. Instead of an embed, here's the link:

http://www.singsnap.com/snap/r/b0126ac54

As you can see from the link, I am part of Sing Snap, under the name pacatrue. (In fact, I think this link might take you to a list of everything I've done, which is a total of 2 songs. I'm thinking of doing a Christmas song weekand then an 80s nostalgia week, but I rarely have discipline to follow such things.)It seems good, but I don't have tons of experience. I went by an earlier link in my Google list and it seemed all too contemporary. I'm really not too good with the Jay-Z or the Kelly Clarkson. But Sing snap had older and newer stuff, though not really enough old skool r&B, meaning no Isley Brothers! no Kool and the Gang! but I'll live.

If anyone wants to join Sing Snap and have an 80s throw down, yell.

Textbook - Section 3

Hi,

This is continuing in the textbook chapter. If you are actually reading this, please leave a comment (just one person is enough) so I can know whether to keep posting these.

This section is about people practicing heritage languages over the Internet. I'd particularly like to know if anyone thinks parts of this section could be offensive, since I touch on issues of cultural labeling, including such lovely issues as the FOB vs Banana war.

From Linguistic Change to Linguistic Preservation

While many are concerned with the potential for computers to alter language, computers are also used every day to preserve languages, both for an individual speaker and for communities. The Internet in particular provides opportunities for heritage language speakers or emigrants to connect to a language community that could be inaccessible otherwise. While the amount of use of the Internet with a heritage language will vary enormously from community to community, groups of people all around the world constantly use the Internet to connect to people they would not be able to speak with otherwise, and they do this across language boundaries. This chance to connect can be vital for some heritage speakers who would otherwise have difficulty developing their abilities in the heritage language.

Jin Sook Lee at the University of California – Santa Barbara recently profiled the experiences of two sisters, born in Korea but moved to the U.S. as children, as they used the Korean online social networking site Cyworld (http://www.cyworld.com) to maintain knowledge of Korean language and stay in touch with contemporary Korean culture. Due to the circumstances of each sister, there was a substantial difference in Korean language ability between the two. The older sister, Jendy, was far more comfortable speaking Korean and seeking out monolingual speakers of Korean as friends online, while the younger sister, Lizzy, largely only spoke Korean to her parents and otherwise used English. Lizzy’s Korean abilities appeared to be deteriorating over time as friends she saw during visits to Korea would tell her that her Korean had gotten worse since the last visit.

For both Jendy and Lizzy, the greatest benefit of participating in the online Cyworld was constant access to other Korean speakers. The older sister Jendy was able to increase the number of people she regularly spoke Korean with from 5-6 in California to 40, including some who only spoke Korean. Lizzy was far more uncertain about her language skills and almost entirely kept her social network confined to people she already knew in life, but even this conservative approach increased her circle of Korean-speaking friends from 2-3 to 15-20.

Several other features of Cyworld and the Internet generally also helped in Korean practice. Both sisters were frequently asked about Korean pop stars by other friends. Stories and gossip on this topic were readily available on Cyworld, providing the sort of content that could have been hard to get without the web-connection. Both sisters reported more freedom to make mistakes in language online without being overly embarrassed about it. The additional barrier of a social networking site itself seemed to decrease worry about mistakes, and the ability to lie about errors, making false claims of typos for instance, helped as well.

In the end, both Jendy and Lizzy reported increased vocabulary, syntactic knowledge, and cultural participation as a result of participating in Cyworld. As seems to always be the case, there were social drawbacks to being on Cyworld as well. Both Jendy and Lizzy were sometimes labeled as FOBs (for Fresh Off the Boat) by Korean-American peers in California for “Cy-ing”. Lizzy in particular seemed to be having trouble navigating the social pressures. On the one hand, she was a FOB for being in Cyworld. At the same time, she expressed a worry that without the Internet, she wouldn't know any Korean and would be labeled a twinkie (a snack cake which is yellow on the outside, white on the inside, expressing the notion that the person’s Asian physical appearance covered a non-Asian inside).

While much language maintenance occurs on a personal level, such as the story of Lizzy and Jendy, organizations frequently also get involved to teach or document a language via computer. Frequently, the use of computers is tangential to the language teaching. For instance, the Nausm Salish Language Revitalization Institute maintains a school for children to learn the Salish language as spoken on the Flathead Indian Reservation in Montana. They maintain a web site, http://salishworld.com, to advertise the mission of the school, raise funds, and provide information about their programs. But the language teaching occurs at the school itself.

A far more extensive use of computers to maintain a language is represented by the efforts of the Choctaw Nation of Oklahoma. Members of the Choctaw Nation live in many different towns, sometimes in numbers far too small to pay for a dedicated Choctaw language teacher in every community. This is compounded by the small number of teachers capable of teaching Choctaw. Therefore, the government of the Choctaw Nation created an online Choctaw School offering classes over the computer. This allows a staff of under-10 to offer programs in 40 high schools, including three levels of the Choctaw Language completely online.

My Trusty Speech-o-Matic!

Wednesday, December 17, 2008

My first Karaoke!

Moonie will be so proud.

Several weeks ago I was sitting in a hotel restaurant working when I heard some karaoke coming from the hotel bar. I was dragged on to a karaoke stage by friends about 10 years ago, but otherwise I've been karaoke free during my time on earth. However, they were singing great classic tunes in the bar that night like Sinatra and Dean Martin, and it sounded like real fun. And so I've been working up the courage to try karaoke again. I haven't actually succeeded in that yet.

But. Tonight I found various sites where you can do online karaoke, and I've been singing ever since B went to bed. For a while I was attempting to be serious and passably good. Never really pulled that off. But then I realized I can be silly and stupid and that worked pretty well. And so with much ado, I now present to you my first ever solo karaoke in my life.

Elvis' Blue Christmas. There are some technical weird things about this process and there's some weird things as well about the singer, so I'm a bit off rhythmically until the second verse. I'm just off tonically after that. Anyone wants to have online karaoke duels in the future.... you are totally on.

Voice Blog

The voice assignment from McKoala and Robin S was to record some boring drivel like the phone book as if we are trying to singlehandledly take everyone who heard our voice to hell. OK, it was to read it sexy. Since I am nothing if not sexy, I've recorded a version.

Here it is as a wav file. Should open Media Player, iTunes or something.

Textbook - Section 2

This is a continuation of the thrilling textbook chapters on language and computers. This is my stunning linguistic analysis of texting. It may be best to read section 1 first. Oh, I should say that the standard in linguistics when you mention a word is to put it in italics, not put quotes around it. In a subject that deals with language, this happens every few sentences. So if I want to say that the word pig is a noun, I say: pig is a noun. Pig is the word that refers to pigs. Section 2 (by the way, there are footnotes that are not being copied across with this):

Example 1: omg u talk so diff in txt msgn!!!!! :)

The first “word” is omg, standing for “oh my god” in standard English, an exclamation of surprise. omg is what’s termed an Initialism, since it is composed of the initial letters of each word. Initialisms are quite common in text messaging, but of course they did not originate there at all. People have been calling the United States of America the USA for quite a long time. Perhaps, however, initialisms like omg are confined to more casual discourse, such as between friends or family. Surely, an initialism like omg would never be appropriate in academic discourse or a business environment? This does not seem to be true either. Academics actually love to name their conferences with Initialisms: The GALANA conference is the Generative Approaches to Language Acquisition in North American conference; the CUNY conference is the City University of New York conference. If someone were to form a 2010 Conference Against the Spread of Acronyms, most people would likely call it CASA 2010.

Corporate environments are not free of Initialisms either. I recently ran across a sign that read, “HMA CME FAC MTG RM 200,” which, as it turns out, stands for “Hawaii Medical Association Continuing Medical Education Facilities Accreditation Committee meeting, room 200.” Without knowledge of the abbreviations, of course, such a sign is meaningless, but if they are known, the first sign is far shorter than the second. The first appeared on a single sheet of paper; the latter would likely require a scroll of some sort.

Right there, we can see one reason why initialisms are so prevalent in text messaging: It saves time in entering the message. Full QWERT-style keyboards are becoming more common as of 2009, such as in Apple iPhones or certain PDAs (Personal Digital Assistants), but most text messaging has been done on cell phones with fewer buttons than letters of the English alphabet. With both tiny keys on many devices and a requirement to press a button several times for a single letter, there is strong pressure to minimize the number of letters that are texted. What you see then in texting is that the idea of initialisms is by no means new to texting or the Internet, but the logistics of the current technology push people to use them to a greater degree. One can speculate that, as the ease of entering letters increases with new computer interfaces, the number of abbreviations would lessen as well.

Not all abbreviations in texting, however, are due to a desire to save time. Many people simply play with language when texting just as people play with language in speech. One might argue that initialisms like “lol” (laugh out loud) or “lmao” (laughing my ass off) once saved time in expressing amusement at something. However, is “ROTFLMYAO” (Rolling on the Floor Laughing My Ass Off) really a time saver? It’s not clear just how much funnier something must be to trigger rolling around on the floor and losing one’s ass as compared to just losing one’s ass. Instead, people are stretching the abbreviations to see what they can get away with. Of course to “get away with” an initialism like that, the person receiving the message must understand it. Many abbreviations are so common as to be understood by almost anyone with passing knowledge of texting or other Internet communication (such as lol). However, people also like to create special abbreviations that only their own friends or specific community can appreciate. This creates a sense of identity and shared experience with others as expressed through language modifications. JAKE NOTE: I AM ASSUMING THE SOCIO DISCUSSIONS HANDLED THIS; IT WOULD BE GOOD TO POINT THAT WAY SOMEHOW IF THEY DID.

An abbreviation can even be something of a political statement. On American political blogs, you can encounter the Initialisms “IOKIYAR” and “IOKIYAD”. Unless you participate in American political debates, these are meaningless, but to those in the know, they stand for “It’s OK If You Are Republican” and “It’s OK If You Are a Democrat.” They are used when some behavior that was condemned by a partisan when the other party did it is defended when the partisan’s own party engages in it. In short, it’s an accusation of hypocrisy. While each initialism is indeed shorter than the full Standard English phrase, the very existence of the initialism also makes a statement. It suggests that the other party acts hypocritically so often that we had to create a special term for it. Intialisms then both mark out who we are and who the Other is.

The next abbreviation in our text message example marks out a different piece of knowledge, phonetic knowledge. “u” is not created from “you” by chopping off the first two letters. After all, thousands of words include the letter “u”. Instead, the texter knows the letter “u” and the word “you” are identical phonetically. Each would be transcribed phonetically [ju] JAKE NOTE: CAN WE POINT THEM BACK TO THE PHONETICS CHAPTER, ASSUMING THIS WAS COVERED THERE? and so one can stand in for the other.

Another common feature of text messaging is the abbreviation of words by omitting the vowels, here represented by “txt msgn”. However, why does this work? It turns out that it is far easier to remove vowels and retain an ability to recognize the word than to drop out the consonants and recognize the word. In the latter case, “text messaging” would become “e eai”, which is utterly incomprehensible. This actually reveals some rather deep properties of language. Two recent studies have attempted to examine differences between how consonants and vowels are perceived by speakers. In these studies, artificial languages were created where the consonants expressed certain patterns or the vowels expressed the same patterns. The question was whether participants in the experiments could find those patterns. It turned out that when the task involved memorizing and recognizing words, participants only succeeded when the clues were on the consonants; however, when the task involved learning abstract rules about the artificial language, participants only succeeded when the clues were on vowels. There appears to be a special connection between consonants and recognizing words. This possibility is even expressed in the natural writing systems of Semitic languages, such as Arabic and Hebrew. In those languages, the word is designated by the consonants of the word, while the vowels indicate its grammatical role in the sentence. In sum, texters seem to unconsciously know the consonant-to-word connection without ever being told.

To sum up, a careful observation of Internet language can reveal much about human language. We’ve had to take a look at issues of identity construction, ease of language production, and even found tentative evidence for a special connection between consonants and words in our internal psychology. Indeed, what we’ve been engaged in is a very rudimentary form of a subfield of linguistics called Corpus Linguistics. Corpus is the Latin word for “body”, and a linguistic corpus is a body of language data that can be analyzed to discover patterns in how people use their language. In the end, the basic features of text messaging are as old as language; however, the existing tools, such as a cell phone’s keypad, pressure those normal linguistic behaviors in new directions.

From Linguistic Change to Linguistic Preservation

Tuesday, December 16, 2008

Textbook - Section 1

Hi.

Basically, the main thing I'm doing now is finishing up the textbook chapter. Since I don't seem to be posting to the blog, I thought I could post parts of the chapter. It's written for the Intro to Ling class and is intended to be "followable" by 18 and 19 year olds who in fact have no interest in language nor in attending class. In other words, I'm attempting to write it for a general population, which includes you guys too. The only real difference is that my chapter would be one of the very last chapters, so they've already read several chapters on various language issues. Due to this, you will periodically see in my text a JAKENOTE in all caps. JAKENOTES are where I ask the editor, Jake, if the students have really already encountered an idea in the text. I welcome any feedback about what's confusing, what's interesting, what's boring, and what's oddly worded.

Anyway, here you go:

Talking to Robots, Talking to Ourselves: Exploring Computers and Language

Usually, we think of technology as something that we as humans create. It’s an external tool we bring into existence to perform some task of our choosing. However, almost all break-through technologies also reveal a great deal about ourselves. Technologies become new ways to explore what it means to be human, even possibly changing what humans are as a result.
This idea has captured the imagination of storytellers ever since stories were told. The classic Greek myth of Daedalus and Icarus is just one example of using a fantastic technological innovation to make a statement about being human. The brilliant inventor Daedalus builds a set of wings for himself and his son Icarus. When they take to the skies, Icarus flies too close to the sun, melting the wings, and sending himself to his death. Mary Shelley’s 19th century story of the creation of a monster by Dr. Frankenstein also makes the reader reflect on what humans will become if technology gives them the power to create life. However, never has the intersection of humanity and technology so fascinated us as it has since the invention of the computer. Computers after all are designed to solve the sorts of problems that only humans have been known to solve – advanced mathematics, novel data analysis, control of other tools, and more. Novels, television, music, painting, and films, have all contemplated what separates humans from computers. Can robots be alive? Could a computer ever be a person? Do we have the moral right to create a living computer? What exactly divides a human from a computer?

As has been discussed in previous chapters, it is commonly argued that nothing is so distinctively human as language JAKE NOTE: HAS ANYONE ACTUALLY SAID THIS?. It is this amazing ability to speak and be understood that separates us from animals more than anything else. If that is the case, it is certainly worthwhile to take a look at language on computers. We communicate using computers so routinely now that for many people under 40, it is hard to remember what it was like not to even be able to do so. We can learn a lot about ourselves and our language by studying Computer-Mediated Communication and this will be the first topic of this chapter. What we will discover, however is that computers are not transparent in this process. They change how we talk to each other. Indeed, the more we ask them to assist us in talking to each other, the more like us they have to become. We end up needing to talk to computers themselves. How this is possible will be our second main topic. Finally, we will look at how we can use computers to study our own language use. Multiple volumes could be written about every couple paragraphs of this chapter, so a list of Further Readings is provided at the end.

omg u talk so diff in txt msgn!!!!! :)

We communicate in many different ways using computers: email, text messaging, internet bulletin boards, blogs, and web pages, just to name a few. Text messaging in particular has exploded in popularity in the past few years and yet is still new enough to many that it’s either a mystery or, for some, something to be feared or resisted. In an article entitled “I h8 txt msgs: How texting is wrecking our language”, John Humphrys doesn’t mince words:

It is the relentless onward march of the texters, the SMS (Short Message Service) vandals who are doing to our language what Genghis Khan did to his neighbours eight hundred years ago.

They are destroying it: pillaging our punctuation; savaging our sentences; raping our vocabulary. And they must be stopped.


While getting quite this excited is probably uncommon, it is quite common to make fun of texting, sorry txtn, in various ways. To assess whether there’s any merit to the claim that text messaging is destroying the world, we first, as linguists, need to understand just what’s different about it. So let’s take the title of this section and take a real look at it.

Example 1: omg u talk so diff in txt msgn!!!!! :)

Thursday, December 04, 2008

One Semester of Spanish Love Song

Perhaps I appreciate this as a language learning journal editor, or maybe it's just funny.

is now ABD

I am now officially All But Dissertation. Woot.

This one actually means something as I get to stop taking classes from here on out. So now I just have to finish this semester and, um, write a dissertation. Yip!