Running PySpark in a Virtual Environment
For many PySpark applications, it is sufficient to use --py-files to
        specify dependencies. However, there are times when --py-files is
        inconvenient, such as the following scenarios:
- 
          
A large PySpark application has many dependencies, including transitive dependencies.
 - 
          
A large application needs a Python package that requires C code to be compiled before installation.
 - 
          
You want to run different versions of Python for different applications.
 
For these situations, you can create a virtual environment as an isolated Python runtime environment. HDP supports VirtualEnv for PySpark in both local and distributed environments, easing the transition from a local environment to a distributed environment.
![]()  | Note  | 
|---|---|
This feature is currently only supported in YARN mode.  | 


