When two individuals who are bi- or multi-lingual in an overlapping set of languages communicate, they tend to switch seemlessly and effortlessly between the languages (codes) they share. Such code-switching is most prominent in spoken language conversations, but also occurs frequently in social media texts that are fairly informal and conversational in nature. The aim of this project is to apply various machine learning methods to such code-switched texts from Twitter, Facebook or Whatsapp, and to, e.g., identify the language of each word or to annotate the texts with part-of-speech tags or utterance boundaries.


