Contributing#
Naming#
When refering to conda-store
it should always be written all
lowercase with a dash in between. conda-store
should also be
lowercase when beginning a sentence.
Development#
Significant effort has been put into simplifying the development and
deployment process of conda-store
. There is a docker based
development workflow along with a non-containerized workflow if you
are using Linux.
Containerized development#
Install the following dependencies before developing on conda-store.
To deploy conda-store
run the following command
docker-compose up --build -d
Important
Many of the conda-store docker images are built/tested for amd64(x86-64)
there will be a performance impact when building and running on
arm architectures. Otherwise this workflow has been shown to run and build on OSX.
Notice the architecture: amd64
whithin the docker-compose.yaml files.
Warning
If you’re developing on a Mac and run into issues that complain about tcp 0.0.0.0:5000: bind: address already in use
you might need to deactivate the Airplay Receiver
service from the Sharing
section in Control Center.
Have a look at this discussion on Apple.com
for more details.
The following resources will be available:
conda-store web server running at http://localhost:5000
MinIO s3 running at http://localhost:9000 with username
admin
and passwordpassword
PostgreSQL running at localhost:5432 with username
admin
and passwordpassword
databaseconda-store
Redis running at localhost:6379 with password
password
JupyterHub running at http://localhost:8000 with any username and password
test
On a fast machine this deployment should only take 10 or so seconds assuming the docker images have been partially built before. If you are making and changes to conda-store-server and would like to see those changes in the deployment. Run.
docker-compose down -v # not always necessary
docker-compose up --build
Linux development#
Install the following dependencies before developing on conda-store.
Install the development dependencies and activate the environment.
conda env create -f conda-store-server/environment-dev.yaml
conda activate conda-store-server-dev
Running conda-store
. --standalone
mode launched celery as a
subprocess of the web server.
python -m conda_store_server.server –standalone tests/assets/conda_store_standalone_config.py
Visit [localhost:5000](http://localhost:5000/)
### Changes to API
The REST API is considered somewhat stable. If any changes are made to
the API make sure the update the OpenAPI/Swagger specification in
`docs/_static/openapi.json`. This may be downloaded from the `/docs`
endpoint when running conda-store. Ensure that the
`c.CondaStoreServer.url_prefix` is set to `/` when generating the
endpoints.
## Documentation
Install the following dependencies before contributing to the
documentation.
- [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)
To build the documentation install the development environment via
Conda.
```shell
conda env create -f conda-store-server/environment-dev.yaml
conda activate conda-store-server-dev
Then go in the documentation directory docs
and build the
documentation.
cd docs
sphinx-build -b html . _build
Then open the documentation via your favorite web browser.
firefox _build/index.html
The documentation has been primarily written in markdown as to make it easier to contribute to the documentation.
Testing#
The conda-store
repository is two packages.
conda-store-server/
which is the worker + web server responsible for theconda-store
serviceconda-store/
is the client which interacts with the service
conda-store#
Linting and formatting checks can be performed via hatch.
$ cd conda-store
$ hatch env run -e dev lint
Running integration tests. These tests are stateful! So you will need to clear the state if you have run the conda-store-server service on docker.
$ cd conda-store
$ docker-compose down -v # ensure you've cleared state
$ docker-compose up --build
# wait until the conda-store-server is running check by visiting localhost:5000
$ pip install -e .
$ ./tests/unauthenticated-tests.sh
$ ./tests/authenticated-tests.sh
$ export CONDA_STORE_URL=http://localhost:5000/conda-store
$ export CONDA_STORE_AUTH=basic
$ export CONDA_STORE_USERNAME=username
$ export CONDA_STORE_PASSWORD=password
$ ./tests/shebang.sh
conda-store-server#
Linting and formatting checks can be performed via hatch.
$ cd conda-store-server
$ hatch env run -e dev lint
Checking that package builds
$ cd conda-store-server
$ hatch build
Running unit tests
$ cd conda-store-server
$ pytest
Running integration tests. These tests are stateful! So you will need to clear the state if you have run the conda-store-server service on docker.
$ cd conda-store-server
$ docker-compose down -v # ensure you've cleared state
$ docker-compose up --build
# wait until the conda-store-server is running check by visiting localhos:5000
$ hatch env run -e dev playwright-test
$ hatch env run -e dev integration-test
Release process#
Choose the <version>
number. It should follow Semantic
Versioning and the established pattern of
v<x>.<y>.<z>
.
Ensure that CHANGELOG.md
is up to date with all the changes since
the last release following the template provided within the markdown
file.
All docker images within docker/kubernetes
should be updated to the
release version. spec.template.spec.containers[0].image
is the path
within the YAML files.
Update the version number in conda-store-server/conda_store_server/__init__.py
and
conda-store/conda_store/__init__.py
to reflect the release version.
Once those changes have been made make a commit titled bump to version <version>
.
Finally create a new release within the GitHub
interface. Do
this instead of a git TAG since you can include release notes on the
repository. The Release should be titled Release <version> - <month>/<day>/<year>
with the description being the changelog
markdown for the particular release.
Once you have create a release the GitHub actions with the build the release and make it available on PyPi, Conda, and DockerHub.
After the PyPi release a release on
Conda-Forge and it located at
conda-forge/conda-store-feedstock. A
PR must be created that updates to the released version
<version>
.
conda-store has two PyPi packages conda-store-server
and conda-store
.
update
recipies/meta.yaml
with the new version{% set version = "<version>" %}
update
recipies/meta.yaml
with the appropriate sha256 for each package. The sha256 can be found athttps://pypi.org/project/conda-store/#files
by clicking theview
button.
Once the PR has been created ensure that you request a rerender
of
the feedstock with the following comment @conda-forge-admin please rerender
. An example of this can be found in PR
#2
Architecture#
conda-store was designed with the idea of scalable enterprise management of reproducible Conda environments.
Configuration#
Traitlets is used for
all configuration of conda-store. In the beginning command line
options were used but eventually we learned that there were too many
options for the user. Traitlets provides a python configuration file
that you can use to configure values of the applications. It is used
for both the server and worker. See
tests/assets/conda_store_config.py
for a full example.
Workers and server#
conda-store can be broken into two components. The workers which have the following responsibilities:
build Conda environments from Conda
environment.yaml
specificationsbuild Conda pack archives
build Conda docker images
remove Conda builds
modify symlinks to point current environment to given build
generally any tasks that can take an unbounded amount of time
All of the worker logic is in conda_store_server/build.py
and
conda_store_server/worker/*.py
. Celery is used for managing tasks so
you will see the celery tasks defined in
conda_store_server/worker/tasks.py
which in turn usually call built
in CondaStore
functions in conda_store_server/app.py
or
conda_store_server/build.py
.
The web server has several responsibilities:
serve a UI for interacting with Conda environments
serve a REST API for managing Conda environments
serve a programmatic Docker registry for interesting docker-conda abilities
The web server is based on
FastAPI. Originally Flask was chosen
due to it being battle tested and that conda-store is not doing any
special things with the web server. However, over time the ability for
robust input and output guarantees from the endpoints along with auto
documentation made FastAPI appealing. The backend web app is defined
in conda_store_server.server.app
. There are several components to
the server:
UI ::
conda_store_server/server/views/ui.py
REST API ::
conda_store_server/server/views/api.py
registry ::
conda_store_server/server/views/registry.py
Both the worker and server need a connection to a SQLAchemy compatible database, Redis, and S3 compatible object storage. The S3 server is used to store all build artifacts for example logs, docker layers, and the Conda-Pack tarball. The PostgreSQL database is used for storing all states on environments and builds along with powering the conda-store web server UI, REST API, and Docker registry. Redis is used for keeping track of task state and results along with enabling locks and realtime streaming of logs.
Terminology#
conda_environment = f(open("environment.yaml"), datatime.utcnow())
namespace :: a way of providing scopes between environments. This prevents Joe’s environment named
data-science
from colliding from Alice’s environment namedata-science
.environment :: a pointer to a current build of a given specification
specification :: a Conda environment.yaml file
build :: a attempt of
conda env install -f environment.yaml
at a given point in time
In order to understand why we have the complicated terminology for an environment it helps to understand how Conda builds a given environment.
Reproducibility of Conda#
name: example
channels:
- defaults
- conda-forge
dependencies:
- python >=3.7
Suppose we have the given environment.yaml
file. How does Conda
perform a build?
Conda downloads
channeldata.json
from each of the channels which list the available architectures.Conda then downloads
repodata.json
for each of the architectures it is interested in (specifically your compute architecture along with noarch). Therepodata.json
has fields like package name, version, and dependencies.
You may notice that the channels listed above do not have a url. This
is because in general you can add
https://conda.anaconda.org/<channel-name>
to a non-url channel.
Conda then performs a solve to determine the exact version and sha256 of each package that it will download
The specific packages are downloaded
Conda does magic to fix the path prefixes of the install
There are two spots that introduce issues to reproducibility. The
first issue is tracking when an environment.yaml
file has
changes. This can be easily tracked by taking a sha256 of the file
. This is what conda-store does but sorts the dependencies to make
sure it has a way of not triggering a rebuild if the order of two
packages changes in the dependencies list. In step (2) repodata.json
is updated regularly. When Conda solves for a user’s environment it
tries to use the latest version of each package. Since repodata.json
could be updated the next minute the same solve for the same
environment.yaml
file can result in different solves.
Authentication Model#
Authentication was modeled after JupyterHub for implementation. There
is a base class conda_store_server.server.auth.Authenticaiton
. If
you are extending and using a form of OAuth2 use the
conda_store_server.server.auth.GenericOAuthAuthentication
. Similar
to JupyterHub all configuration is modified via
Traitlets. Below shows
an example of setting us OAuth2 via JupyterHub for conda-store.
c.CondaStoreServer.authentication_class = JupyterHubOAuthAuthentication
c.JupyterHubOAuthAuthentication.jupyterhub_url = "http://jupyterhub:8000"
c.JupyterHubOAuthAuthentication.client_id = "service-this-is-a-jupyterhub-client"
c.JupyterHubOAuthAuthentication.client_secret = "this-is-a-jupyterhub-secret"
Once a user is authenticated a cookie or JSON Web
Token is created to store the user credentials to
ensure that conda-store is as stateless as possible. At this current
point in time conda-store does not differentiate between a service and
user. Similar to JupyterHub
conda_store_server.server.auth.Authentication
has an authenticate
method. This method is the primary way to customize authentication. It
is responsible for checking that the user credentials to login are
correct as well as returning a dictionary following the schema
conda_store_server.schema.AuthenticationToken
. This stores a
primary_namespace
for a given authenticated service or user. In
addition a dictionary of <namespace>/<name>
map to a set of
roles. See the Authorization model to better understand the key to set
of roles meaning.
Database#
Model#
At a high level the database model can be described in the image bellow.
Important things to note about the relationship:
An
environment
exists within a givennamespace
and always has a currentbuild
A
build
belongs to a particularenvironment
and has associatedcondapackage
andbuildartfacts
A
buildartifact
is a way for the database to keep track of external resources for example s3 artifacts, filesystem directories, etcA
condapackage
is a representation of a given Conda package which belongs to a givencondachannel
A
specification
is the environment.yaml using inconda env create -f <environment.yaml>
The following will generate the database model shown bellow. It was
generated from the examples/docker
example. You’ll see in the
command that we are excluding several tables. These tables are managed
by celery.
pip install eralchemy # not available on conda-forge
eralchemy -i "postgresql+psycopg2://admin:password@localhost:5432/conda-store" \
-x celery_tasksetmeta celery_taskmeta kombu_queue kombu_message alembic_version \
-o output.png
Migrations#
conda-store relies on SQLAlchemy for ORM mapping, and on Alembic for DB migrations.
The procedure to modify the database is the following :
First, modify the ORM Model according to the changes you want to make
edit the file
conda-store-server/alembic.ini
and replace the value for entrysqlalchemy.url
to match the connection URL of your database.in your command line, run the following :
cd conda-store-server/conda_store_server
alembic revision --autogenerate -m "description of your changes"
You should have a new file in
conda-store-server/conda_store_server/alembic/versions/
. Review it thoroughly. It contains thealembic
operations (op
) to actually modify the database, either when upgrading (upgrade
function) or downgrading (downgrade
)You can migrate your data within these
upgrade
/downgrade
functions, for example :
from alembic import op
# revision identifiers, used by Alembic.
revision = 'abcdef01234567'
down_revision = '987654321f0edc'
branch_labels = None
depends_on = None
def upgrade():
# operations to modify the database structure
# ...
op.create_table(
'new_table',
Column('id', INTEGER, primary_key=True),
Column('field1', VARCHAR(50), nullable=False),
Column('field2', INTEGER),
Column('timestamp', TIMESTAMP, server_default=func.now())
)
# ...
op.execute('''INSERT INTO new_table (field1, field2)
SELECT field1, field2
FROM old_table''')
# other operations to modify the database structure
# ...
def downgrade():
op.drop_table('new_table')
Once you’re sure about the changes generated, you can apply them by running :
alembic upgrade head
Check your database : your changes should be reflected. If not, refer to Alembic’s documentation.