App Documentation
Subpackages
Submodules
app.environment_settings
Module with simulation and project related variables.
This module demonstrates holds multiple constant variables that are used through out the simulation’s lifetime including initialization and execution.
Note
To configure the amount of available
Network
Nodes
system,
the initial size of a file
Cluster
Group
that
work on the durability of a file, the way files are
distributed
among the clusters’ nodes at the start of a simulation and, the actual
name of the file whose persistence is being simulated, you should create
a simulation file using this
script
and
follow the respective instructions. To run the script type in your
command line terminal:
$ python simfile_generator.py --file=filename.json
It is also strongly recommended that the user does not alter any
undocumented attributes or module variables unless they are absolutely
sure of what they do and the consequence of their changes. These include
variables such as
SHARED_ROOT
and
SIMULATION_ROOT
.
-
get_disk_error_chances( simulation_epochs ) [source] -
Defines the probability of a file block being corrupted while stored at the disk of a
network node.Note
Recommended value should be based on the paper named An Analysis of Data Corruption in the Storage Stack . Thus the current implementation follows this formula:
(
MAX_EPOCHS/MONTH_EPOCHS) *P(Xt ≥ L))The notation
P(Xt ≥ L)denotes the probability of a disk developing at least L checksum mismatches within T months since the disk’s first use in the field. As described in linked paper.- Parameters
-
simulation_epochs ( int ) – The number of epochs the simuulation is expected to run assuming no failures occur.
- Returns
-
A two element list with respectively, the probability of losing and the probability of not losing a file block due to disk errors, at an epoch basis.
- Return type
-
List[ float ]
-
set_blocks_count( n ) [source] -
Changes
BLOCKS_COUNTconstant value at run time.
-
set_blocks_size( n ) [source] -
Changes
BLOCKS_SIZEconstant value at run time to the given n bytes.
-
set_loss_chance( v ) [source] -
Changes
LOSS_CHANCEconstant value at run time.
-
set_replication_level( n ) [source] -
Changes
REPLICATION_LEVELconstant value at run time.
-
ATOL: float = 0.05 -
Defines the maximum amount of absolute positive or negative deviation that a current distribution
cv_can have from the desired steady statev_, in order for the distributions to be considered equal and thus marking the epoch as convergent.This constant will be used by
app.domain.cluster_groups.SGCluster.equal_distributions()along with a relative tolerance that is the minimum value inv_.
-
BLOCKS_COUNT: int = 46 -
Defines into how many
FileBlockDatainstances a file is divided into. Either use this orBLOCKS_SIZEbut not both.
-
BLOCKS_SIZE: int = 1048576 -
Defines the raw size of each file block before it’s wrapped in a
FileBlockDatainstance object.Some possible values include { 32KB = 32768B; 128KB = 131072B; 512KB = 524288B; 1MB = 1048576B; 20MB = 20971520B }.
-
DEBUG: bool = False -
Indicates if some debug related actions or prints to the terminal should be performed.
-
DELIVER_CHANCE: float = 0.96 -
Defines the probability of a message being delivered to a destination, in the simulation environment.
-
LOSS_CHANCE: float = 0.04 -
Defines the probability of a message not being delivered to a destination due to network link problems, in the simulation environment.
-
MATLAB_DIR: str = 'hive/docs/scripts/matlab' -
Path the folder where matlab scripts are located. Used by
MatlabEngineContainer
-
MAX_REPLICATION_DELAY: int = 3 -
The maximum amount of epoch time steps replica file block blocks take to be regenerated after their are lost.
-
MIN_CONVERGENCE_THRESHOLD: int = 0 -
The number of consecutive epoch time steps that a
SGClustermust converge before epochs start being marked with verified convergence inapp.domain.helpers.smart_dataclasses.LoggingData.convergence_set.
-
MIN_REPLICATION_DELAY: int = 1 -
The minimum amount of epoch time steps replica file block blocks take to be regenerated after their are lost.
-
MONTH_EPOCHS: int = 21600 -
Defines how many epochs (discrete time steps) a month is represented with. With the default value of 21600 each epoch would represent two minutes. See
get_disk_error_chances().
-
NEWSCAST_CACHE_SIZE: int = 20 -
attr: NewscastNode view <app.domain.network_nodes.NewscastNode> can have at any given time.
- Type
-
The maximum amount of neighbors a
- Type
-
py
-
OUTFILE_ROOT: str = 'hive/app/static/outfiles' -
Path to the folder where simulation output files are located.
-
REPLICATION_LEVEL: int = 3 -
The amount of replicas each file block has.
-
RESOURCES_ROOT: str = 'hive/app/static/resources' -
Path to the folder where miscellaneous files are located.
-
RTOL: float = 0.05 -
Defines the maximum amount of relative positive or negative deviation that a current distribution
cv_can have from the desired steady statev_, in order for the distributions to be considered equal and thus marking the epoch as convergent.This constant will be used by
app.domain.cluster_groups.SGCluster.equal_distributions()along with a relative tolerance that is the minimum value inv_.
-
SHARED_ROOT: str = 'hive/app/static/shared' -
Path to the folder where files to be persisted during the simulation are located.
-
SIMULATION_ROOT: str = 'hive/app/static/simfiles' -
Path to the folder where simulation files to be executed by
app.hive_simulationare located.
app.hive_simulation
This scripts’s functions are used to start simulations.
You can start a simulation by executing the following command:
$ python hive_simulation.py --file=a_simulation_name.json --iterations=30
You can also execute all simulation file that exist in
SIMULATION_ROOT
by instead executing:
$ python hive_simulation.py -d -i 24
If you wish to execute multiple simulations in parallel (to save time) you can use the -t or –threading flag in either of the previously specified commands. The threading flag expects an integer that specifies the max working threads. For example:
$ python hive_simulation.py -d --iterations=1 --threading=2
Warning
Python’s
ThreadPoolExecutor
conceals/supresses any uncaught exceptions, i.e., simulations may fail to
execute or log items properly and no debug information will be provided
If you don’t have a simulation file yet, run the following instead:
$ python simfile_generator.py --file=filename.json
Note
For the simulation to run without errors you must ensure that:
The specified simulation files exist in
SIMULATION_ROOT.Any file used by the simulation, e.g., a picture or a .pptx document is accessible in
SHARED_ROOT.An output file directory exists with default path being:
OUTFILE_ROOT.
-
__makedirs__( ) [source] -
Helper method that reates required simulation working directories if they do not exist.
- Return type
-
_parallel_main( start , stop ) [source] -
Helper method that initializes a multi-threaded simulation.
- Parameters
-
-
start ( int ) – A number that marks the first desired identifier for the simulations that will execute.
-
stop ( int ) – A number that marks the last desired identifier for the simulations that will execute. Usually a sum of
startand the total number of iterations specified by the user in the scripts’ arguments.
-
- Return type
-
_simulate( simfile_name , sid ) [source] -
Helper method that orders execution of one simulation instance.
-
_single_main( start , stop ) [source] -
Helper function that initializes a single-threaded simulation.
- Parameters
-
-
start ( int ) – A number that marks the first desired identifier for the simulations that will execute.
-
stop ( int ) – A number that marks the last desired identifier for the simulations that will execute. Usually a sum of
startand the total number of iterations specified by the user in the scripts’ arguments.
-
- Return type
-
_validate_simfile( simfile_name ) [source] -
Asserts if simulation can proceed with user specified file.
- Parameters
-
simfile_name ( str ) – The name of the simulation file, including extension, whose existence inside
SIMULATION_ROOTwill be checked. - Return type
-
get_next_scenario( k ) [source] -
Function used for one-to-one testing of different swarm guidance configurations.
Note
This method should only be used when
app.environment_settings.DEBUGis set to True.- Parameters
-
k ( str ) – A string identifying the pool of matrix, vector pairs to get the scenario. Usually, a string representation of an integer which corresponds to the network size being tested.
- Returns
-
A topology matrix and a random equilibrium vector that can be used to generate Markov chains used for Swarm Guidance.
- Return type
-
Tuple[ numpy.ndarray , numpy.ndarray ]
app.mixing_rate_sampler
This is a non-essential module used for convex optimization prototyping.
This functionality tests and compares the mixing rate of various markov matrices.
You can start a test by executing the following command:
$ python mixing_rate_sampler.py --samples=1000
You can also specify the names of the functions used to generate markov matrices like so:
$ python mixing_rate_sampler.py -s 10 -f afunc,anotherfunc,yetanotherfunc
Note
Default functions set { “new_mh_transition_matrix”, “new_sdp_mh_transition_matrix”, “new_go_transition_matrix”, “new_mgo_transition_matrix” }
-
main( ) [source] -
Compares the mixing rate of the markov matrices generated by all specified functions , samples times.
The execution of the main method results in a JSON file outputed to
MIXING_RATE_SAMPLE_ROOTfolder.
-
_ResultsDict: OrderedDict [ str , _SizeResultsDict ]
app.simfile_generator
This scripts’s functions are used to create a simulation file for the user.
You can create a simulation file by following the instructions that appear in your terminal when running the following command:
$ python simfile_generator.py --file=filename.json
Note
Simulation files are placed inside
SIMULATION_ROOT
directory. Any file
used to simulate persistance must be inside
SHARED_ROOT
directory.
-
_in_yes_no( message ) [source] -
Asks the user to reply with yes or no to a message.
-
_init_nodes_uptime( ) [source] -
Creates a record containing network nodes’ uptime.
- Returns
-
A dictionary where keys are
network node identifiersand values are their respective uptimesuptimevalues. - Return type
-
_init_persisting_dict( ) [source] -
Creates the “persisting” key of simulation file.
- Returns
-
A dictionary containing data respecting files to be shared in the system
- Return type
-
Dict[ str , Any]
-
_input_bounded_float( message , lower_bound = 0.0 , upper_bound = 100.0 ) [source] -
Obtains a user inputed integer within the specified closed interval.
-
_input_bounded_integer( message , lower_bound = 2 , upper_bound = 10000000 ) [source] -
Obtains a user inputed integer within the specified closed interval.
- Parameters
- Returns
-
An integer inputed by the user.
- Return type
-
_input_character_option( message , white_list ) [source] -
Obtains a user inputed character within a predefined set.
- Parameters
- Returns
-
The character that represents the initial distribution of files in a
domain.cluster_groups’s class instance desired by the user. - Return type
-
_input_filename( message ) [source] -
Asks the user to input the name of a file in the command line terminal.
A warning message is displayed if the specified file does not exist inside
SHARED_ROOTNote
Defaults to
"FBZ_0134.NEF"when input is blank. This file should be present insideSHARED_ROOTunless it was previously deleted by the user.
-
yield_label( ) [source] -
Used to generate an arbrirary numbers of unique labels.
- Examples:
-
The following code snippets illustrate the result of calling this method
ntimes.>>> n = 4 >>> for s in itertools.islice(yield_label(), n): ... return s [a, b, c, d] >>> n = 4 + 26 >>> for s in itertools.islice(yield_label(), n): ... return s [a, b, c, d, ..., aa, ab, ac, ad]
- Yields
-
The next string label in the sequence.
- Return type
app.sample_scenario_fixer
Excludes all <topologies, equilibrium> pairs in the
scenarios.json
file
that are not synthetizable by our implementation of
Metropolis
Hastings
. Such JSON file is created
using the script
sample_scenario_generator
.
To execute this file run the following command:
$ python sample_scenario_generator.py
Note
This script expects to fix a file named “scenarios.json” under the
RESOURCES_ROOT
directory. If you wish
to modify this behavior you need to customize the script to accept one
additional argument which indicates the name of the file to be fixed.
-
__select_fastest_topology__( a , v_ ) [source] -
Emulates Swarm Guidance Clusters’ fastest topology selection for MH algorithms.
- Parameters
-
-
a ( numpy.ndarray ) –
-
v_ ( numpy.ndarray ) –
-
- Return type
app.sample_scenario_generator
Creates an arbrirary number of symmetric connected topologies and equilibrium
vectors that can be read during simulations for one to one comparison between
algorithms. There is a small chance that generated pairs can not be solved by
heuristic Markov chain generating algorithms such as our implementation of
Metropolis
Hastings
. To ensure that algorithm
can be used over the generated pairs, run
sample_scenario_fixer
,
which removes all invalid entries from the generated json file.
To execute this file run the following command (both arguments are optional):
$ python sample_scenario_generator.py --samples=1000 --network_sizes=8,16,32
Note
The output of this script is a file named “scenarios.json” under the
RESOURCES_ROOT
directory. If you wish
to modify this behavior you need to customize the script to accept one
additional argument which than saves the file under a different name. You
also need to ensure that all other uses of “scenarios.json” are changed
accordingly.
app.type_hints
-
ClusterDict: Dict [ str , ClusterType ]
-
ClusterType: Union [ cg.Cluster , cg.SGCluster , cg.SGClusterExt , cg.HDFSCluster , cg.NewscastCluster ]
-
HttpResponse: Union [ int , e.HttpCodes ]
-
MasterType: Union [ ms.Master , ms.SGMaster , ms.HDFSMaster , ms.NewscastMaster ]
-
NodeType: Union [ nn.Node , nn.SGNode , nn.SGNodeExt , nn.HDFSNode , nn.NewscastNode ]
-
ReplicasDict: Dict [ int , sd.FileBlockData ]