Posts in 'spark'

Ramandeep Singh Nanda

Writing Generic UDFs in Spark

Apache Spark offers the ability to write Generic UDFs. However, for an idiomatic implementation, there are a couple of things that one needs to keep in mind.

  1. You should return a subtype of Option because Spark treats None subtype automatically as null and is able to extract value from Some ...

Ramandeep Singh Nanda

Testing Spark Dataframes

Testing Spark Dataframe transforms is essential and can be accomplished in a more reusable manner. The way, I generally accomplish that is to

  • Read the expected and test Dataframe, and
  • Invoke the desired transform, and
  • Calculate the difference between dataframes. The only caveat in calculating the difference is that in ...

Ramandeep Singh Nanda

Parallel Orchestration of Spark ETL Processing

I have been working a lot on Spark and Scala. I have really like scala as a language, due to its numerous advantages over Java, the foremost being that for a simpler API having Type classes and Default Method Arguments does wonders. Also, idiomatic scala code uses higher order functions ...