Posts – Page 2

Ramandeep Singh Nanda

Impala vs Hive vs RDBMS

Hive or Impala ?

Hive and Impala both support SQL operation, but the performance of Impala is far superior than that of Hive. Although now with Spark SQL engine and use of HiveContext the performance of hive queries is also significantly fast, impala still has a better performance. The reason that ...

Ramandeep Singh Nanda

Java8: Decorating with Functional Programming and Generics

The Idea

Java 8 introduced functional programming support, this is a powerful feature which was missing from earlier versions. One of the benefits of functional programming is that it can be used to implement decorator pattern without the use of inheritance. One common requirement is to implement some kind of ...

Ramandeep Singh Nanda

Retrofit 2.0 Conditional Authentication

You might run into a scenario where you might require conditional authentication with Retrofit 2.0.

This post provides an example of integration with the Lyft API. In case of the Lyft API, first we need to authenticate with and query the oauth/token endpoint to obtain the OAUTH token ...

Ramandeep Singh Nanda

Why you should use square root of Gini Index

In this post I will explain why you should use square root of Gini index while building decision tree classification models. In decision tress, We know that at every node we need to choose a feature that provides the best split i.e. the feature that reduces the child nodes ...

Ramandeep Singh Nanda

Opening Box Office Weekend Prediction

Introduction

We investigate whether tweets the amount and sentiment in them can predict opening weekend of box office. Specifically we target a threshold i.e. around 30 million dollars, but more specifically it is the mean of the opening weekend of the entire dataset.

About the Dataset

Labeled data for classifying tweets

This dataset was obtained from multiple sources and contains manually labeled dataset.