Training Scripts
Generate Tokenized Dataset
This Script generates a Tokenized Dataset from a HuggingFace-Dataset and stores it on the disk.
Parameters:
--save_path SAVE_PATH
: Path to save-directory for the generated Dataset--min_tokens MIN_TOKENS
: Number of minimum Tokens one Context has to contain in order to be used in the generated Dataset--num_processes NUM_PROCESSES
: Number of concurrent Processes for downloading Contexts--tokenizer_name TOKENIZER_NAME
: HuggingFace-Name for the Tokenizer for tokenizing the Contexts. Currently only works for CodeLlama-Tokenizers--dataset_name DATASET_NAME
: HuggingFace-Name for Dataset to load data fromnum_files
: Number of Files to generate. If Multiprocessing is used: Has to be multiple of num_processesnum_samples_per_file
: Number of tokenized Contexts per File
Example:
python3 generate_tokenized_dataset.py --save_path ~/llama_tokenized_dataset --min_tokens 256 --num_processes 16 16 10000
Generates a Tokenized Dataset of Samples, containing at least 256 Tokens with 16 Processes. Saves this Dataset into 16 Files, each containing 10000 Tokenized Contexts.
Train Autoencoder
This Script trains a Sparse Autoencoder Model.
Parameters:
--num_batches NUM_BATCHES
: Amount of Batches (AutoEncoder-sized) used to train the AutoEncoder on--dataset_path DATASET_PATH
: Path to the Pretokenized Dataset--save_path SAVE_PATH
: Path to save the trained AutoEncoders to--model_name MODEL_NAME
: HuggingFace-Name for the Model for obtaining Activations. Currently only works for CodeLlama-Models--batch_size_llm BATCH_SIZE_LLM
: Batch Size used to obtain Model Activations from the LLM--batch_size_autoencoder BATCH_SIZE_AUTOENCODER
: Batch Size used to train the AutoEncoder--num_tokens NUM_TOKENS
: Amount of Tokens used to train AutoEncoder--device_llm DEVICE_LLM
: Device to load the LLM to--device_autoencoder DEVICE_AUTOENCODER
: Device to load the AutoEncoder to--learning_rate LEARNING_RATE
: Learning-Rate for AutoEncoder-Training.--l1_coefficient L1_COEFFICIENT
: L1-Coefficient or Sparsity-Coefficient for AutoEncoder-Training.--act_vec_size ACT_VEC_SIZE
: Size of the Activation-Vector inputted to the AutoEncoder.--dict_vec_size DICT_VEC_SIZE
: Size of the Dictionary-Vector produced by the AutoEncoder.--batches_between_ckpt BATCHES_BETWEEN_CKPT
: Number of Batches to train the AutoEncoder, before the Model is saved as a Checkpoint-File and a Feature-Frequencies-Image is generated.--num_batches_preload NUM_BATCHES_PRELOAD
: Buffer-Size of Activation-Vectors for Training. If Buffer is empty, will be refilled. Larger Buffer results in higher Randomness while Training.--neuron_resampling_method NEURON_RESAMPLING_METHOD
: Strategy for Neuron-Resampling. Currently available: ‘replacement’, ‘anthropic’--neuron_resampling_interval NEURON_RESAMPLING_INTERVAL
: Amount of Batches (AutoEncoder-sized) to train, until Neuron-Resampling--normalize_dataset
: If activated, all Activation Vectors in the Dataset will be normalized to L2-Norm of sqrt(n) (with n being input dimension of AutoEncoder)--mlp_activations_hookpoint MLP_ACTIVATIONS_HOOKPOINT
: Hookpoint description for MLP-Activations. e.g. model.layers.{}.mlp.act_fn ({} for Layer Index)--mlp_sublayer_hookpoint MLP_SUBLAYER_HOOKPOINT
: Hookpoint description for MLP-Sublayer. e.g. model.layers.{}.mlp ({} for Layer Index)--attn_sublayer_hookpoint ATTN_SUBLAYER_HOOKPOINT
: Hookpoint description for Attention-Sublayer. e.g. model.layers.{}.self_attn ({} for Layer Index)layer_id
: ID of Layer from which the Activations are obtainedlayer_type
: Type of Layer from which the Activations are collected. Select ‘attn_sublayer’, ‘mlp_sublayer’ or ‘mlp_activations’.
Example:
python3 train_autoencoder_from_tokens.py --num_batches 50000 --dataset_path ~/tokenized_dataset/ --save_path ~/autoencoders/l19_lr2e-4_spar0.5 --batch_size_llm 16 --batch_size_autoencoder 1024 --num_tokens 64 --device_llm cuda:0 --device_autoencoder cuda:1 --learning_rate 0.0002 --batches_between_ckpt 5000 --num_batches_preload 5000 19 mlp_activations
Generate Interpretation Samples
This Script samples Activations of the Neurons of the Sparse Autoencoder and generates heuristics on their Activation.
Parameters:
--dataset_path DATASET_PATH
: Path of Tokenized Dataset to use for Obtaining Interpretation Samples--autoencoder_path AUTOENCODER_PATH
: Path of Autoencoder Model to analyze--save_path SAVE_PATH
: Path to save the Interpretation Samples to--target_model_name TARGET_MODEL_NAME
: Name of Target-Model. Currently, only CodeLlama-Models are supported--target_model_device TARGET_MODEL_DEVICE
: Device of Target Model--autoencoder_device AUTOENCODER_DEVICE
: Device of Autoencoder--log_freq_upper LOG_FREQ_UPPER
: Maximal Log-Feature-Frequency at which a Feature should be interpreted--log_freq_lower LOG_FREQ_LOWER
: Minimal Log-Feature-Frequency at which a Feature should be interpretednum_samples
: Number of Interpretation Samples to obtain
Comment on log_freq_upper
and log_freq_lower
: The Training Process of an SAE produces the Model and also Frequency-Histograms. Features, that appear very often/rare might not be of interest to be interpreted and can be excluded from Interpretation (very resource-intensive). The Values of log_freq_upper
and log_freq_lower
refer to \(\log_{10}(\text{Probability of Activation of Neuron})\). So if log_freq_upper
is \(-0.1\), Features that have a higher probability to activate, than \(10^{-0.1}\) are discarded.
Example:
python3 generate_interpretation_samples.py --dataset_path ~/tokenized_dataset/ --autoencoder_path ~/l19_lr2e-4_spar0.5/50000.pt --save_path ~/interp_samples_l19.pt --target_model_device cuda:3 --autoencoder_device cuda:2 10000
Interpret Autoencoder
This Script interprets the Sparse Autoencoder. Only startable with Deepspeed!
Parameters:
--dataset_path DATASET_PATH
: Path of Tokenized Dataset to use for Obtaining Interpretation Samples--interpretation_samples_path INTERPRETATION_SAMPLES_PATH
: Path of the saved Interpretation-Samples--interpretation_model_name INTERPRETATION_MODEL_NAME
: Name of Interpretation-Model. Currently, only CodeLlama-Models are supported--num_gpus NUM_GPUS
: Number of GPUs to use--autoencoder_path AUTOENCODER_PATH
: Path to Autoencoder to interpret--num_interpretation_samples NUM_INTERPRETATION_SAMPLES
: Number of Interpretation Samples to use--num_simulation_samples NUM_SIMULATION_SAMPLES
: Number of Simulation Samples to use--local_rank LOCAL_RANK
: Local Rank--ssl_cert SSL_CERT
: Path to SSL-Cert of ElasticSearchserver_address
: Address to ElasticSearch-Server (e.g. https://DOMAIN:PORT)api_key
: API-Key to ElasticSearch-Server
Example:
deepspeed interpret_autoencoder_deepspeed.py --dataset_path ~/tokenized_dataset --interpretation_samples_path ~/interp_samples_l19.pt --num_gpus 4 --autoencoder_path ~/l19_lr2e-4_spar0.5/50000.pt 127.0.0.1:1234 API_KEY