App Documentation

Subpackages

Submodules

app.environment_settings

Module with simulation and project related variables.

This module demonstrates holds multiple constant variables that are used through out the simulation’s lifetime including initialization and execution.

Note

To configure the amount of available Network Nodes system, the initial size of a file Cluster Group that work on the durability of a file, the way files are distributed among the clusters’ nodes at the start of a simulation and, the actual name of the file whose persistence is being simulated, you should create a simulation file using this script and follow the respective instructions. To run the script type in your command line terminal:

$ python simfile_generator.py --file=filename.json

It is also strongly recommended that the user does not alter any undocumented attributes or module variables unless they are absolutely sure of what they do and the consequence of their changes. These include variables such as SHARED_ROOT and SIMULATION_ROOT .

get_disk_error_chances ( simulation_epochs ) [source]

Defines the probability of a file block being corrupted while stored at the disk of a network node .

Note

Recommended value should be based on the paper named An Analysis of Data Corruption in the Storage Stack . Thus the current implementation follows this formula:

( MAX_EPOCHS / MONTH_EPOCHS ) * P(Xt ≥ L) )

The notation P(Xt ≥ L) denotes the probability of a disk developing at least L checksum mismatches within T months since the disk’s first use in the field. As described in linked paper.

Parameters: simulation_epochs ( int ) – The number of epochs the simuulation is expected to run assuming no failures occur.
Returns: A two element list with respectively, the probability of losing and the probability of not losing a file block due to disk errors, at an epoch basis.
Return type: List[ float ]

set_blocks_count ( n ) [source]

Changes BLOCKS_COUNT constant value at run time.

Parameters: n ( int ) –
Return type: None

set_blocks_size ( n ) [source]

Changes BLOCKS_SIZE constant value at run time to the given n bytes.

Parameters: n ( int ) –
Return type: None

set_loss_chance ( v ) [source]

Changes LOSS_CHANCE constant value at run time.

Parameters: v ( float ) –
Return type: None

set_replication_level ( n ) [source]

Changes REPLICATION_LEVEL constant value at run time.

Parameters: n ( int ) –
Return type: None

ATOL : float = 0.05

Defines the maximum amount of absolute positive or negative deviation that a current distribution cv_ can have from the desired steady state v_ , in order for the distributions to be considered equal and thus marking the epoch as convergent.

This constant will be used by app.domain.cluster_groups.SGCluster.equal_distributions() along with a relative tolerance that is the minimum value in v_ .

BLOCKS_COUNT : int = 46: Defines into how many FileBlockData instances a file is divided into. Either use this or BLOCKS_SIZE but not both.

BLOCKS_SIZE : int = 1048576

Defines the raw size of each file block before it’s wrapped in a FileBlockData instance object.

Some possible values include { 32KB = 32768B; 128KB = 131072B; 512KB = 524288B; 1MB = 1048576B; 20MB = 20971520B }.

DEBUG : bool = False: Indicates if some debug related actions or prints to the terminal should be performed.

DELIVER_CHANCE : float = 0.96: Defines the probability of a message being delivered to a destination, in the simulation environment.

LOSS_CHANCE : float = 0.04: Defines the probability of a message not being delivered to a destination due to network link problems, in the simulation environment.

MATLAB_DIR : str = 'hive/docs/scripts/matlab': Path the folder where matlab scripts are located. Used by MatlabEngineContainer

MAX_REPLICATION_DELAY : int = 3: The maximum amount of epoch time steps replica file block blocks take to be regenerated after their are lost.

MIN_CONVERGENCE_THRESHOLD : int = 0: The number of consecutive epoch time steps that a SGCluster must converge before epochs start being marked with verified convergence in app.domain.helpers.smart_dataclasses.LoggingData.convergence_set .

MIN_REPLICATION_DELAY : int = 1: The minimum amount of epoch time steps replica file block blocks take to be regenerated after their are lost.

MONTH_EPOCHS : int = 21600: Defines how many epochs (discrete time steps) a month is represented with. With the default value of 21600 each epoch would represent two minutes. See get_disk_error_chances() .

NEWSCAST_CACHE_SIZE : int = 20

attr: NewscastNode view <app.domain.network_nodes.NewscastNode> can have at any given time.

Type: The maximum amount of neighbors a
Type: py

OUTFILE_ROOT : str = 'hive/app/static/outfiles': Path to the folder where simulation output files are located.

REPLICATION_LEVEL : int = 3: The amount of replicas each file block has.

RESOURCES_ROOT : str = 'hive/app/static/resources': Path to the folder where miscellaneous files are located.

RTOL : float = 0.05

Defines the maximum amount of relative positive or negative deviation that a current distribution cv_ can have from the desired steady state v_ , in order for the distributions to be considered equal and thus marking the epoch as convergent.

This constant will be used by app.domain.cluster_groups.SGCluster.equal_distributions() along with a relative tolerance that is the minimum value in v_ .

SHARED_ROOT : str = 'hive/app/static/shared': Path to the folder where files to be persisted during the simulation are located.

SIMULATION_ROOT : str = 'hive/app/static/simfiles': Path to the folder where simulation files to be executed by app.hive_simulation are located.

app.hive_simulation

This scripts’s functions are used to start simulations.

You can start a simulation by executing the following command:

$ python hive_simulation.py --file=a_simulation_name.json --iterations=30

You can also execute all simulation file that exist in SIMULATION_ROOT by instead executing:

$ python hive_simulation.py -d -i 24

If you wish to execute multiple simulations in parallel (to save time) you can use the -t or –threading flag in either of the previously specified commands. The threading flag expects an integer that specifies the max working threads. For example:

$ python hive_simulation.py -d --iterations=1 --threading=2

Warning

Python’s ThreadPoolExecutor conceals/supresses any uncaught exceptions, i.e., simulations may fail to execute or log items properly and no debug information will be provided

If you don’t have a simulation file yet, run the following instead:

$ python simfile_generator.py --file=filename.json

Note

For the simulation to run without errors you must ensure that:

The specified simulation files exist in SIMULATION_ROOT .

Any file used by the simulation, e.g., a picture or a .pptx document is accessible in SHARED_ROOT .

An output file directory exists with default path being: OUTFILE_ROOT .

__makedirs__ ( ) [source]

Helper method that reates required simulation working directories if they do not exist.

Return type: None

_parallel_main ( start , stop ) [source]

Helper method that initializes a multi-threaded simulation.

Parameters

start ( int ) – A number that marks the first desired identifier for the simulations that will execute.
stop ( int ) – A number that marks the last desired identifier for the simulations that will execute. Usually a sum of start and the total number of iterations specified by the user in the scripts’ arguments.

Return type

None

_simulate ( simfile_name , sid ) [source]

Helper method that orders execution of one simulation instance.

Parameters

simfile_name ( str ) – The name of the simulation file to be executed.
sid ( int ) – A sequence number that identifies the simulation execution instance.

Return type

None

_single_main ( start , stop ) [source]

Helper function that initializes a single-threaded simulation.

Parameters

start ( int ) – A number that marks the first desired identifier for the simulations that will execute.
stop ( int ) – A number that marks the last desired identifier for the simulations that will execute. Usually a sum of start and the total number of iterations specified by the user in the scripts’ arguments.

Return type

None

_validate_simfile ( simfile_name ) [source]

Asserts if simulation can proceed with user specified file.

Parameters: simfile_name ( str ) – The name of the simulation file, including extension, whose existence inside SIMULATION_ROOT will be checked.
Return type: None

get_next_scenario ( k ) [source]

Function used for one-to-one testing of different swarm guidance configurations.

Note

This method should only be used when app.environment_settings.DEBUG is set to True.

Parameters: k ( str ) – A string identifying the pool of matrix, vector pairs to get the scenario. Usually, a string representation of an integer which corresponds to the network size being tested.
Returns: A topology matrix and a random equilibrium vector that can be used to generate Markov chains used for Swarm Guidance.
Return type: Tuple[ numpy.ndarray , numpy.ndarray ]

app.mixing_rate_sampler

This is a non-essential module used for convex optimization prototyping.

This functionality tests and compares the mixing rate of various markov matrices.

You can start a test by executing the following command:

$ python mixing_rate_sampler.py --samples=1000

You can also specify the names of the functions used to generate markov matrices like so:

$ python mixing_rate_sampler.py -s 10 -f afunc,anotherfunc,yetanotherfunc

Note

Default functions set { “new_mh_transition_matrix”, “new_sdp_mh_transition_matrix”, “new_go_transition_matrix”, “new_mgo_transition_matrix” }

main ( ) [source]

Compares the mixing rate of the markov matrices generated by all specified functions , samples times.

The execution of the main method results in a JSON file outputed to MIXING_RATE_SAMPLE_ROOT folder.

_ResultsDict : OrderedDict [ str , _SizeResultsDict ]

_SizeResultsDict : OrderedDict [ str , List [ float ] ]

app.simfile_generator

This scripts’s functions are used to create a simulation file for the user.

You can create a simulation file by following the instructions that appear in your terminal when running the following command:

$ python simfile_generator.py --file=filename.json

Note

Simulation files are placed inside SIMULATION_ROOT directory. Any file used to simulate persistance must be inside SHARED_ROOT directory.

_in_yes_no ( message ) [source]

Asks the user to reply with yes or no to a message.

Parameters: message ( str ) – The message to be printed to the user upon first input request.
Returns: True if user presses yes, otherwise False .
Return type: bool

_init_nodes_uptime ( ) [source]

Creates a record containing network nodes’ uptime.

Returns: A dictionary where keys are network node identifiers and values are their respective uptimes uptime values.
Return type: Dict[ str , float ]

_init_persisting_dict ( ) [source]

Creates the “persisting” key of simulation file.

Returns: A dictionary containing data respecting files to be shared in the system
Return type: Dict[ str , Any]

_input_bounded_float ( message , lower_bound = 0.0 , upper_bound = 100.0 ) [source]

Obtains a user inputed integer within the specified closed interval.

Parameters

message ( str ) – The message to be printed to the user upon first input request.
lower_bound ( float ) – Any input smaller than`lower_bound` is rejected.
upper_bound ( float ) – Any input bigger than upper_bound is rejected.

Returns

An float inputed by the user.

Return type

float

_input_bounded_integer ( message , lower_bound = 2 , upper_bound = 10000000 ) [source]

Obtains a user inputed integer within the specified closed interval.

Parameters

message ( str ) – The message to be printed to the user upon first input request.
lower_bound ( int ) – Any input equal or smaller than lower_bound is rejected.
upper_bound ( int ) – Any input equal or bigger than upper_bound is rejected.

Returns

An integer inputed by the user.

Return type

int

_input_character_option ( message , white_list ) [source]

Obtains a user inputed character within a predefined set.

Parameters

message ( str ) – The message to be printed to the user upon first input request.
white_list ( List [ str ] ) – A list of valid option characters.

Returns

The character that represents the initial distribution of files in a domain.cluster_groups ’s class instance desired by the user.

Return type

str

_input_filename ( message ) [source]

Asks the user to input the name of a file in the command line terminal.

A warning message is displayed if the specified file does not exist inside SHARED_ROOT

Note

Defaults to "FBZ_0134.NEF" when input is blank. This file should be present inside SHARED_ROOT unless it was previously deleted by the user.

Parameters: message ( str ) – The message to be printed to the user upon first input request.
Returns: A file name with extension.
Return type: str

yield_label ( ) [source]

Used to generate an arbrirary numbers of unique labels.

Examples:

The following code snippets illustrate the result of calling this method n times.

                >>> n = 4
 >>> for s in itertools.islice(yield_label(), n):
 ...     return s
 [a, b, c, d]

>>> n = 4 + 26
 >>> for s in itertools.islice(yield_label(), n):
 ...     return s
 [a, b, c, d, ..., aa, ab, ac, ad]

              

Yields: The next string label in the sequence.
Return type: str

app.sample_scenario_fixer

Excludes all <topologies, equilibrium> pairs in the scenarios.json file that are not synthetizable by our implementation of Metropolis Hastings . Such JSON file is created using the script sample_scenario_generator .

To execute this file run the following command:

$ python sample_scenario_generator.py

Note

This script expects to fix a file named “scenarios.json” under the RESOURCES_ROOT directory. If you wish to modify this behavior you need to customize the script to accept one additional argument which indicates the name of the file to be fixed.

__select_fastest_topology__ ( a , v_ ) [source]

Emulates Swarm Guidance Clusters’ fastest topology selection for MH algorithms.

Parameters

a ( numpy.ndarray ) –
v_ ( numpy.ndarray ) –

Return type

numpy.ndarray

__validate_mc__ ( m , v_ ) [source]

Asserts if the inputed Markov Matrix that converges to the desired equilibrium.

Parameters

m ( pandas.core.frame.DataFrame ) –
v_ ( pandas.core.frame.DataFrame ) –

Return type

bool

app.sample_scenario_generator

Creates an arbrirary number of symmetric connected topologies and equilibrium vectors that can be read during simulations for one to one comparison between algorithms. There is a small chance that generated pairs can not be solved by heuristic Markov chain generating algorithms such as our implementation of Metropolis Hastings . To ensure that algorithm can be used over the generated pairs, run sample_scenario_fixer , which removes all invalid entries from the generated json file.

To execute this file run the following command (both arguments are optional):

$ python sample_scenario_generator.py --samples=1000 --network_sizes=8,16,32

Note

The output of this script is a file named “scenarios.json” under the RESOURCES_ROOT directory. If you wish to modify this behavior you need to customize the script to accept one additional argument which than saves the file under a different name. You also need to ensure that all other uses of “scenarios.json” are changed accordingly.

app.type_hints

ClusterDict : Dict [ str , ClusterType ]

ClusterType : Union [ cg.Cluster , cg.SGCluster , cg.SGClusterExt , cg.HDFSCluster , cg.NewscastCluster ]

HttpResponse : Union [ int , e.HttpCodes ]

MasterType : Union [ ms.Master , ms.SGMaster , ms.HDFSMaster , ms.NewscastMaster ]

NodeDict : Dict [ str , NodeType ]

NodeType : Union [ nn.Node , nn.SGNode , nn.SGNodeExt , nn.HDFSNode , nn.NewscastNode ]

ReplicasDict : Dict [ int , sd.FileBlockData ]