Pyspark | DeveloperZen

Pyspark

A thumbnail image

Best Practices Writing Production-Grade PySpark Jobs

How to Structure Your PySpark Job Repository and Code Using PySpark to process large amounts of data in a distributed fashion is a great way to manage large-scale data-heavy tasks and gain business insights while not sacrificing on developer efficiency. In short, PySpark is awesome. However, while there are a lot of code examples out there, there’s isn’t a lot of information out there (that I could find) on how to build a PySpark codebase— writing modular jobs, building, packaging, handling dependencies, testing, etc.

Jan 13, 2017 - 8 min read

Explore →

analytics apache spark books developer productivity django engineering management interview questions kaizen kubernetes management programming projects prolog pyspark python software development startup life