Projects
Students will work in groups on a project. Each project uses Hadoop to solve a natural language processing problem.
Each group will be largely self-directed, but can call upon the organisers for help.
Students can also propose their own projects.
Project 1: Finding breaking news in Twitter
Can we find breaking news stories in Twitter?
More.
Project 2: Which groups of people use Twitter?
Efficiently cluster authors on Twitter into interesting groups.
More.
Project 3: Computing Kneser-Ney smoothed language models
Compute a K-N smoothed language model using large volumes of data.
More.
Project 4: Page Rank for Twitter Users
Given the complete follower graph of Twitter users, compute Page Rank for each author. Who has the highest PR?
More.
Project 5: Distributed discriminative supervised machine learning
Discriminative machine learning approaches often produce the best performance
for many problems. Can we get it to run
using gigantic amounts of data?
More.
Project 6: Finding useful information in a sea of garbage
Often we have massive volumes of low-grade training material (eg data from the Web). Can we spot items in it that are likely to improve performance of some task (eg reduce language model perplexity)?
.
More.
Project 7: Clustering words into classes
Cluster words into useful classes using the Exchange Algorithm
.
More.
Project 8: Your own project
Don't like our suggestions? The simply propose your own!
Contact us and we
will try to arrange for this to happen.