Hope you are doing well and enjoying the summer!
I am writing you again to show a new project I was working on from past month. It is a tiny bit similar to the SpamBrainz and you can find a demo here(demo) and gitlink for the same here(git).
This as an NLP project which retrieves the name of a movie from a tweet! It is quite a challenging task as there can be multiple movie names in a tweet and we need to extract the one used in the context of the movie. For example, one tweet like ‘I hate frozen food’ and consider another tweet ‘I am in love with songs from Frozen #newFoundLove’. So our model should ideally return Frozen for the second tweet but none for the first one.
We developed a pipeline that starts with tokenizing tweets, normalizing the identified tokens and then identifying the candidates by matching them with our movie gazetteer. Next step is to classify the candidate as a movie based on features like NGRAM, orthographic projections etc.We used SVM model(because we were reproducing this paper for our coursework) with specified parameters.
How to use demo? Add a tweet like
RT @LeighMcManus1 : 85 minutes into Dallas Buyers Club and I 'm only realising now that Jarred Leto is the tranny
in the text input and click Retrieve. It will show all the intermediate steps of the pipeline.
Let me know if you have any doubts/feedback etc etc!