Changelog¶

0.9.3 (2019-07-08)¶

View mechanism added which allows to reuse the same dataset for different purposes, e.g. training set and test set.
Added a dataset randomization which allows to internally randomize the data in order to avoid having to use shuffle=True with the fit method. This allows fetch randomized data in coherent chunks from hdf5 format files which improves access time.
Added lazy loading mechanism for DNA and BED files, which defer the determination of the genome size to the dataset creation phase, but does not perform it when loading cached files to improve reload time.
Caching logic improved in order to maximize the amount of reusability of dataset. For example, when the whole genome is loaded, the data can later be reloaded with different binsizes.
Variant effect prediction functionality added.
Improved efficiency for loading coverage from an array.
Added axis option to ReduceDim
Added Track classes to improve flexibility on plotGenomeTrack

Removed HTSeq dependence in favour of pybedtools for parsing BED, GFF, etc. This also introduces the requirement to have bedtools installed on the system, but it allows to parse BED-like files faster and more conveniently.
Internal rearrangements for GenomicArray store_whole_genome=False. Now the data is stored as one array in a dict-like handle with the dummy key ‘data’ rather than storing the data in a fragmented fashion using as key-values the genomic interval and the respective coverages associated with them. This makes storage and processing more efficient.
Bugfix: added conditions property to wrapper datasets.

Added various features and bug fixes:

Changes in janggu.data

Added new dataset wrapper to remove NaNs: NanToNumConverter
Added new dataset wrappers for data augmentation: RandomOrientation, RandomSignalScale
Adapted ReduceDim wrapper: added aggregator argument
plotGenomeTrack added figsize option
plotGenomeTrack added other plot types, including heatmap and seqplot.
plotGenomeTrack refactoring of internal code
Bioseq bugfix: Fixed issue for reverse complementing N’s in the sequence.
GenomicArray: condition, order, resolution are not read from the cache anymore, but from the arguments to avoid inconsistencies
Normalization of Cover can handle a list of normalizer callables which are applied in turn
Normaliation and Transformation: Added PercentileTrimming, RegionLengthNormalization, LogTransform
ZScore and ZScoreLog do not apply RegionLengthNormalization by default anymore.
janggu.data version-aware caching of datasets included
Added copy method for janggu datasets.
split_train_test refactored
removed obsolete transformations attribute from the datasets
Adapted the documentation
Refactoring according to suggestions from isort and pylint

Changes in janggu

Added input_attribution via integrated gradients for feature importance assignment
Performance scoring by name for Janggu.evaluate for a number common metrices, including ROC, PRC, correlation, variance explained, etc.
training.log is stored by default for each model
Added model_from_json, model_from_yaml wrappers
inputlayer decorator only instantiates Input layers if inputs == None, which makes the use of inputlayer less restrictive when using nested functions
Added create_model method to create a keras model directly
Adapted the documentation
Refactoring according to suggestions from isort and pylint

Bugfix for ROIs that reach beyond the chromosome when loading Bioseq datasets. Now, zero-padding is performed for intervals that stretch over the sequence ends.

Updated abstract, added logo
Utility: janggutrim command line tool for cutting bed file regions to avoid unwanted rounding effects. If rounding issues are detected an error is raised.
Caching mechanism revisited. Caching of datasets is based on determining the sha256 hash of the dataset. If the data or some parameters change, the files are automatically reloaded. Consequently, the arguments overwrite and datatags become obsolete and have been marked for deprecation.
Refactored access of GenomicArray
Added ReduceDim wrapper to convert a 4D Cover object to a 2D table-like object.

Fixed issues for loading SparseGenomicArray
Made GenomicIndexer.filter_by_region aware of flank
Fixed BedLoader of partially overlapping ROI and bedfiles issue using filter_by_region.
Adapted classifier, license and keywords in setup.py
Fixed hyperlinks