Kedro Logo
0.16

Introduction

  • Introduction
    • What is Kedro?
    • Learning about Kedro
    • Assumptions
      • Official Python programming language website
      • List of free programming books and tutorials

Getting Started

  • Installation prerequisites
    • macOS / Linux
    • Working with virtual environments
      • conda
      • venv (instead of conda)
      • pipenv (instead of conda)
  • Installation guide
    • Installing Kedro
    • Verifying a successful installation
    • Installing workflow dependencies
      • Installing all dependencies
      • Installing dependencies related to the Data Catalog
        • Installing dependencies at a group-level
        • Installing dependencies at a type-level
  • Creating a new project
    • Create a new project interactively
    • Create a new project from a configuration file
    • Create a new project using starters
    • Working with your new project
      • Initialise a git repository
      • Amend project-specific dependencies
        • Using kedro build-reqs
        • Using kedro install
  • A “Hello World” example
    • Project directory structure
    • Project source code
      • Writing code
    • Project components
    • Data
    • Example pipeline
    • Configuration
      • Project-specific configuration
      • Sensitive or personal configuration
    • Running the example
    • Summary
  • Creating new projects with Kedro Starters
    • Introducing Kedro starters
    • Using starter aliases
    • List of official starters
    • Using a starter’s version
    • Using starter in interactive mode
    • Using starter with a configuration file

Tutorial

  • Typical Kedro workflow
    • Development workflow
      • 1. Set up the project template
      • 2. Set up the data
      • 3. Create the pipeline
      • 4. Package the project
    • Git workflow
      • Creating a project repository
      • Submitting your changes to GitHub
  • Kedro Spaceflights tutorial
    • Creating the tutorial project
      • Install project dependencies
      • Project configuration
  • Setting up the data
    • Adding your datasets to data
      • reviews.csv
      • companies.csv
      • shuttles.xlsx
    • Reference all datasets
    • Creating custom datasets
      • Contributing a custom dataset implementation
  • Creating a pipeline
    • Node basics
    • Assemble nodes into a pipeline
    • Persisting pre-processed data
    • Creating a master table
      • Working in a Jupyter notebook
      • Extending the project’s code
    • Working with multiple pipelines
    • Partial pipeline runs
      • Using pipeline name
      • Using tags
    • Using decorators for nodes and pipelines
      • Decorating the nodes
      • Decorating the pipeline
    • Kedro runners
  • Packaging a project
    • Add documentation to your project
    • Package your project
    • Manage project dependencies
    • Extend your project
    • What is next?

User Guide

  • Setting up Visual Studio Code
    • Advanced: For those using venv / virtualenv
    • Setting up tasks
    • Debugging
      • Advanced: Remote Interpreter / Debugging
    • Configuring the Kedro catalog validation schema
  • Setting up PyCharm
    • Set up Run configurations
    • Debugging
    • Advanced: Remote SSH interpreter
    • Configuring the Kedro catalog validation schema
  • Configuration
    • Local and base configuration
    • Loading
    • Additional configuration environments
    • Templating configuration
    • Parameters
      • Loading parameters
      • Specifying parameters at runtime
      • Using parameters
    • Credentials
      • AWS credentials
    • Configuring kedro run arguments
  • The Data Catalog
    • Using the Data Catalog within Kedro configuration
    • Specifying the location of the dataset
    • Using the Data Catalog with the YAML API
    • Adding parameters
    • Feeding in credentials
    • Loading multiple datasets that have similar configuration
    • Transcoding datasets
      • A typical example of transcoding
      • How does transcoding work?
    • Transforming datasets
      • Applying built-in transformers
      • Developing your own transformer
    • Versioning datasets and ML models
    • Using the Data Catalog with the Code API
    • Configuring a Data Catalog
    • Loading datasets
      • Behind the scenes
      • Viewing the available data sources
    • Saving data
      • Saving data to memory
      • Saving data to a SQL database for querying
      • Saving data in parquet
      • Creating your own dataset
  • Nodes
    • Creating a pipeline node
      • Node definition syntax
      • Syntax for input variables
      • Syntax for output variables
    • Tagging nodes
    • Running nodes
      • Applying decorators to nodes
      • Applying multiple decorators to nodes
  • Pipelines
    • Building pipelines
      • Tagging pipeline nodes
      • Merging pipelines
      • Fetching pipeline nodes
    • Developing modular pipelines
      • What are modular pipelines?
      • How do I create modular pipelines?
      • Modular pipeline structure
        • Ease of use and portability
      • How do I package a modular pipeline?
      • How do I distribute a modular pipeline?
        • Publishing the wheel file using twine
        • Uploading the wheel file to the object storage
      • A modular pipeline example template
      • Configuration
      • Datasets
    • Connecting existing pipelines
    • Using a modular pipeline twice
    • Using a modular pipeline with different parameters
    • Bad pipelines
      • Pipeline with bad nodes
      • Pipeline with circular dependencies
    • Running pipelines
      • Runners
        • Using a custom runner
      • Asynchronous loading and saving
      • Running a pipeline by name
      • Modifying a kedro run
      • Applying decorators on pipelines
    • Running pipelines with IO
    • Outputting to a file
    • Partial pipelines
      • Partial pipeline starting from inputs
      • Partial pipeline starting from nodes
      • Partial pipeline ending at nodes
      • Partial pipeline from nodes with tags
      • Running only some nodes
      • Recreating Missing Outputs
  • Logging
    • Configure logging
    • Use logging
    • Logging for anyconfig
  • Advanced IO
    • Error handling
    • AbstractDataSet
    • Versioning
      • version namedtuple
      • Versioning using the YAML API
      • Versioning using the Code API
      • Supported datasets
    • Partitioned dataset
      • Partitioned dataset definition
        • Dataset definition
        • Partitioned dataset credentials
      • Partitioned dataset load
      • Partitioned dataset save
      • Incremental loads with IncrementalDataSet
        • Incremental dataset load
        • Incremental dataset save
        • Incremental dataset confirm
        • Checkpoint configuration
        • Special checkpoint config keys
  • Working with PySpark
    • Centralise Spark configuration in conf/base/spark.yml
    • Initialise a SparkSession in ProjectContext
    • Use Kedro’s built-in Spark datasets to load and save raw data
      • spark.SparkDataSet
      • spark.SparkJDBCDataSet
      • spark.SparkHiveDataSet
    • Use MemoryDataSet for intermediary DataFrame
    • Use MemoryDataSet with copy_mode="assign" for non-DataFrame Spark objects
    • Tips for maximising concurrency using ThreadRunner
  • Developing Kedro plugins
    • Overview
    • Initialisation
    • global and project commands
      • Suggested command convention
    • Working with click
    • Contributing process
    • Example of a simple plugin
    • Supported plugins
    • Community-developed plugins
  • Working with IPython and Jupyter Notebooks / Lab
    • Startup script
    • Working with context
      • Additional parameters for context.run()
    • Adding global variables
    • Working with IPython
      • Loading DataCatalog in IPython
    • Working from Jupyter
      • Idle notebooks
      • What if I cannot run kedro jupyter notebook?
      • Loading DataCatalog in Jupyter
      • Saving DataCatalog in Jupyter
      • Using parameters
      • Running the pipeline
      • Converting functions from Jupyter Notebooks into Kedro nodes
    • Extras
      • IPython loader
        • Installation
        • Prerequisites
  • Working with Databricks
    • Databricks Connect (recommended)
    • GitHub workflow with Databricks
  • Journal
    • Overview
      • Context journal record
      • Dataset journal record
    • Steps to manually reproduce your code and run the previous pipeline
  • Creating a new dataset
    • Scenario
    • Project setup
    • Problem
    • The anatomy of a dataset
    • Implement the _load method with fsspec
    • Implement the _save method with fsspec
    • Implement the _describe method
    • Bringing it all together
    • Integrating with PartitionedDataSet
    • Adding Versioning
    • Thread-safety consideration
    • Handling credentials and different filesystems
    • Contribute your dataset to Kedro
  • Hooks
    • Introduction
    • Concepts
      • Hook specification
      • Hook implementation
        • Registering your Hook implementations with Kedro
    • Under the hood
  • Debugging
    • Introduction
    • Debugging Node
    • Debugging Pipeline

Resources

  • Frequently asked questions
    • What is Kedro?
    • What are the primary advantages of Kedro?
    • How does Kedro compare to other projects?
      • Kedro vs workflow schedulers
      • Kedro vs other ETL frameworks
    • How can I find out more about Kedro?
      • Articles, podcasts and talks
      • Kedro used on real-world use cases
      • Community interaction
    • What is data engineering convention?
    • What version of Python does Kedro support?
    • How do I upgrade Kedro?
    • What best practice should I follow to avoid leaking confidential data?
    • What is the philosophy behind Kedro?
    • Where do I store my custom editor configuration?
    • How do I look up an API function?
    • How do I build documentation for my project?
    • How do I build documentation about Kedro?
    • How can I use development version of Kedro?
  • Kedro architecture overview
    • Building blocks
      • Project
        • kedro_cli.py
        • run.py
        • .kedro.yml
        • 00-kedro-init.py
        • ProjectContext
      • Framework
        • kedro cli
        • kedro/cli/cli.py
        • plugins
        • get_project_context()
        • load_context()
        • KedroContext
      • Library
        • ConfigLoader
        • Pipeline
        • AbstractRunner
        • DataCatalog
        • AbstractDataSet
  • Guide to CLI commands
    • Autocomplete
    • Global Kedro commands
    • Project-specific Kedro commands
      • kedro run
      • kedro build-reqs
      • kedro install
      • kedro test
      • kedro package
      • kedro build-docs
      • kedro jupyter notebook, kedro jupyter lab, kedro ipython
      • kedro jupyter convert
      • kedro lint
      • kedro activate-nbstripout
      • kedro catalog list
      • kedro pipeline list
      • kedro pipeline create
      • kedro pipeline package <pipeline_name>
      • kedro pipeline delete <pipeline_name>
    • Using Python
  • Linting your Kedro project
  • Images and icons
    • White background
      • Icon
      • Icon with text
    • Black background
      • Icon
      • Icon with text

API Docs

  • kedro
    • kedro.config
      • kedro.config.ConfigLoader
      • kedro.config.TemplatedConfigLoader
    • kedro.framework.hooks
      • Data Catalog Hooks
        • kedro.framework.hooks.specs.DataCatalogSpecs
      • Node Hooks
        • kedro.framework.hooks.specs.NodeSpecs
      • Pipeline Hooks
        • kedro.framework.hooks.specs.PipelineSpecs
    • kedro.io
      • Data Catalog
        • kedro.io.DataCatalog
      • Data Sets
        • kedro.io.LambdaDataSet
        • kedro.io.MemoryDataSet
        • kedro.io.PartitionedDataSet
        • kedro.io.IncrementalDataSet
        • kedro.io.CachedDataSet
        • kedro.io.DataCatalogWithDefault
      • Errors
        • kedro.io.DataSetAlreadyExistsError
        • kedro.io.DataSetError
        • kedro.io.DataSetNotFoundError
      • Base Classes
        • kedro.io.AbstractDataSet
        • kedro.io.AbstractVersionedDataSet
        • kedro.io.AbstractTransformer
        • kedro.io.Version
    • kedro.pipeline
      • kedro.pipeline.Pipeline
      • kedro.pipeline.node.Node
      • kedro.pipeline.node
      • kedro.pipeline.decorators.log_time
    • kedro.runner
      • kedro.runner.AbstractRunner
      • kedro.runner.SequentialRunner
      • kedro.runner.ParallelRunner
      • kedro.runner.ThreadRunner
    • kedro.framework.context
      • Base Classes
        • kedro.framework.context.KedroContext
      • Functions
        • kedro.framework.context.load_context
      • Errors
        • kedro.framework.context.KedroContextError
    • kedro.framework.cli
      • kedro.framework.cli.get_project_context
    • kedro.versioning
      • Base Classes
        • kedro.versioning.Journal
      • Modules
      • Errors
    • kedro.extras.datasets
      • Data Sets
        • kedro.extras.datasets.api.APIDataSet
        • kedro.extras.datasets.biosequence.BioSequenceDataSet
        • kedro.extras.datasets.dask.ParquetDataSet
        • kedro.extras.datasets.geopandas.GeoJSONDataSet
        • kedro.extras.datasets.json.JSONDataSet
        • kedro.extras.datasets.matplotlib.MatplotlibWriter
        • kedro.extras.datasets.holoviews.HoloviewsWriter
        • kedro.extras.datasets.networkx.NetworkXDataSet
        • kedro.extras.datasets.pandas.CSVDataSet
        • kedro.extras.datasets.pandas.ExcelDataSet
        • kedro.extras.datasets.pandas.AppendableExcelDataSet
        • kedro.extras.datasets.pandas.FeatherDataSet
        • kedro.extras.datasets.pandas.GBQTableDataSet
        • kedro.extras.datasets.pandas.HDFDataSet
        • kedro.extras.datasets.pandas.JSONDataSet
        • kedro.extras.datasets.pandas.ParquetDataSet
        • kedro.extras.datasets.pandas.SQLQueryDataSet
        • kedro.extras.datasets.pandas.SQLTableDataSet
        • kedro.extras.datasets.pickle.PickleDataSet
        • kedro.extras.datasets.pillow.ImageDataSet
        • kedro.extras.datasets.spark.SparkDataSet
        • kedro.extras.datasets.spark.SparkHiveDataSet
        • kedro.extras.datasets.spark.SparkJDBCDataSet
        • kedro.extras.datasets.tensorflow.TensorFlowModelDataset
        • kedro.extras.datasets.text.TextDataSet
        • kedro.extras.datasets.yaml.YAMLDataSet
    • kedro.extras.decorators
      • kedro.extras.decorators.retry_node.retry
      • kedro.extras.decorators.memory_profiler.mem_profile
    • kedro.extras.transformers
      • kedro.extras.transformers.memory_profiler.ProfileMemoryTransformer
      • kedro.extras.transformers.time_profiler.ProfileTimeTransformer
    • kedro.extras.logging
      • kedro.extras.logging.ColorHandler
Kedro
  • Docs »
  • Python Module Index

Python Module Index

k
 
k
- kedro
    kedro.config
    kedro.extras.datasets
    kedro.extras.decorators
    kedro.extras.logging
    kedro.extras.transformers
    kedro.framework.cli
    kedro.framework.context
    kedro.framework.hooks
    kedro.io
    kedro.pipeline
    kedro.runner
    kedro.versioning

© Copyright 2020, QuantumBlack Visual Analytics Limited

Built with Sphinx using a theme provided by Read the Docs.