Posts tagged 'Machine Learning'

Ramandeep Singh Nanda

Apache Zeppelin Notebooks

Apache Zeppelin provides a Web-UI where you can iteratively build spark scripts in Scala, Python, etc. (It also provides autocomplete support), run Sparkql queries against Hive or other store and visualize the results from the query or spark dataframes. This is somewhat akin to what Ipython notebooks do for python ...

Ramandeep Singh Nanda

Machine learning with Apache Spark, Scala and Hive

Apache spark has an advanced DAG execution engine and supports in memory computation. In memory computation combined with DAG execution leads to a far better performance than running map reduce jobs. In this post, I will show an example of using Linear regression with Apache Spark. The dataset is NYC-Yellow ...

Ramandeep Singh Nanda

Why you should use square root of Gini Index

In this post I will explain why you should use square root of Gini index while building decision tree classification models. In decision tress, We know that at every node we need to choose a feature that provides the best split i.e. the feature that reduces the child nodes ...

Ramandeep Singh Nanda

Opening Box Office Weekend Prediction


We investigate whether tweets the amount and sentiment in them can predict opening weekend of box office. Specifically we target a threshold i.e. around 30 million dollars, but more specifically it is the mean of the opening weekend of the entire dataset.

About the Dataset

Labeled data for classifying tweets

This dataset was obtained from multiple sources and contains manually labeled dataset.