app.domain

Subpackages

app.domain.helpers

Submodules

app.domain.cluster_groups

This module contains domain specific classes that represent groups of storage nodes.

class Cluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: object

Represents a group of network nodes ensuring the durability of a file.

id

A unique identifier of the Cluster instance.

Type: str

current_epoch

The simulation’s current epoch.

Type: int

corruption_chances

A two-element list containing the probability of FileBlockData being corrupted and not being corrupted, respectively. See get_disk_error_chances() for corruption chance configuration.

Type: List[float]

master

A reference to a server that coordinates or monitors the Cluster.

Type: Master

members

A collection of network nodes that belong to the Cluster.

Type: NodeDict

_members_view

A list representation of the nodes in members.

Type: List[NodeType]

file

A reference to FileData object that represents the file being persisted by the Cluster instance.

Type: FileData

critical_size

Minimum number of network nodes plus required to exist in the Cluster to assure the target replication level.

Type: int

sufficient_size

Sum of critical_size and the number of nodes expected to fail between two successive recovery phases.

Type: int

original_size

The initial and theoretically optimal Cluster size.

Type: int

redundant_size

Application-specific parameter, which indicates that membership of the Cluster must be pruned.

Type: int

running

Indicates if the Cluster instance is active. Used by Master to manage the simulation processes.

Type: bool

_membership_changed

Flag indicates wether or not _members_view needs to be updated during membership_maintenance(). The variable is set to false at the beggining of every epoch and set to true if the length of off_nodes list return by nodes_execute() is bigger than zero.

Type: bool

_recovery_epoch_sum

Helper attribute that facilitates the storage of the sum of the values returned by all set_recovery_epoch() method calls. Important for logging purposes.

Type: int

_recovery_epoch_calls

Helper attribute that facilitates the storage of the sum of the values returned by all set_recovery_epoch() method calls throughout the current_epoch.

Type: int

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters

master (MasterType) – A reference to an Master object that manages the Cluster being initialized.
file_name (str) – The name of the file the Cluster is responsible for persisting.
members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.
sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

_get_new_members()[source]

Helper method that searches for possible network node by querying the master of the Cluster.

Returns: A dictionary mapping where keys are node identifiers and values are node instances.
Return type: NodeDict

_log_evaluation(plive, ptotal=- 1)[source]

Helper that collects Cluster data and registers it on a logger object.

Parameters

plive (int) – The number of existing parts in the cluster at the simulation’s current epoch at online or suspect nodes.
ptotal (int) – The number of existing parts in the cluster at the simulation’s current epoch. This parameter is optional and may be used or not depending on the intent of the system. As a rule of thumb plive tracks the number of parts that are alive in the system for logging purposes, where as ptotal is used for comparisons and averages, e.g., SGCluster evaluate.

Return type

None

_set_fail(message)[source]

Ends the Cluster instance simulation.

Sets running to False and orders FileData to write collected logs to disk and close it’s out_file stream.

Parameters: message (str) – A short explanation of why the Cluster terminated early.
Return type: None

_setup_epoch(epoch)[source]

Initializes some attributes cluster attributes at the start of an epoch.

This method also forces all of the Clusters members to update their connectivity status before any node is instructed to execute.

Parameters: epoch (int) – The simulation’s current epoch.
Return type: None

complain(complainter, complainee, reason)[source]

Registers a complaint against a possibly offline node.

Note

This method provides no default functionality and should be overridden in sub classes if required.

Parameters

complainter (str) – The identifier of the complaining network node.
complainee (str) – The identifier of the network node being complained about.
reason (app.type_hints.HttpResponse) – The http code that led to the complaint.

Return type

None

evaluate()[source]

Evaluates and logs the health, possibly other parameters, of the Cluster at every epoch.

Return type: None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters: epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.
Returns: False if Cluster failed to persist the file it was responsible for, otherwise True.
Return type: None

get_cluster_status()[source]

Determines the Cluster’s status based on the length of the current members list.

Returns: The status of the Cluster as a string.
Return type: str

get_node()[source]

Retrives a random node from the members of the cluster group, whose status is likely to be online.

Returns: A random network node from members.
Return type: NodeType

maintain(off_nodes)[source]

Offers basic maintenance functionality for Cluster types.

If off_nodes list param as at least one node reference, _membership_changed is set to True.

Parameters: off_nodes (List[th.NodeType]) – A possibly empty of offline nodes.
Return type: None

membership_maintenance()[source]

Attempts to recruits new network nodes to be members of the cluster.

The method updates both members and _members_view.

Returns: A dictionary that is empty if membership did not change.
Return type: NodeDict

nodes_execute()[source]

Queries all members to execute the epoch.

This method logs the amount of lost replicas throughout current_epoch according to the members who went offline and the FileBlockData replicas they posssed and is responsible for setting a replication epoch. Similarly it logs the number of members who disconnected.

Returns: List of members that disconnected during the current_epoch. See app.domain.network_nodes.Node.update_status().
Return type: List[NodeType]

route_part(sender, receiver, replica, is_fresh=False)[source]

Sends a file block replica to some other network node in members.

Parameters

sender (str) – An identifier of the network node who is sending the message.
receiver (str) – The destination network node identifier.
replica (FileBlockData) – The file block replica to be sent specified destination: receiver.
is_fresh (bool) – Prevents recently created replicas from being corrupted, since they are not likely to be corrupted in disk. This argument facilitates simulation.

Returns

An http code sent by the receiver.

Return type

int

set_replication_epoch(replica)[source]

Delegates to set_replication_epoch().

Parameters: replica (domain.helpers.smart_dataclasses.FileBlockData) – The file block replica that was lost.
Return type: None

spread_files(replicas, strat='i')[source]

Distributes a collection of file block replicas among the members of the cluster group.

Parameters

replicas (ReplicasDict) – The FileBlockData replicas, without replication.
strat (str) –
Defines how replicas will be initially distributed in the Cluster. Unless overridden in children of this class the received value of strat will be ignored and will always be set to the default value i.

i
This strategy creates a probability vector containing the normalization of network nodes uptimes' and uses that vector to randomly select which node will receive each replica. There is a bias to give more replicas to the most resillent nodes which results from using the created probability vector.

Return type

None

class HDFSCluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.Cluster

Represents a group of network nodes ensuring the durability of a file in a Hadoop Distributed File System scenario.

Note

Members of HDFSCluster are of type HDFSNode, they do not perform swarm guidance behaviors and instead report with regular heartbeats to their monitors. This class could be a NameNode Server in HDFS or a master server in GFS.

suspicious_nodes

A set containing the identifiers of suspicious network nodes.

Type: set

data_node_heartbeats

A dictionary mapping node identifiers to the number of complaints made against them. Each node has five lives. When they miss five beats in a row, i.e., when the dictionary value count is zero, they are evicted from the cluster.

Type: Dict[str, int]

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters

master (MasterType) – A reference to an Master object that manages the Cluster being initialized.
file_name (str) – The name of the file the Cluster is responsible for persisting.
members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.
sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

evaluate()[source]

Logs the number of existing replicas in the HDFSCluster.

Overrides:: app.domain.cluster_groups.Cluster.evaluate().

Return type: None

maintain(off_nodes)[source]

Evicts any network node whose heartbeats in data_node_heartbeats reached zero.

Extends:: app.domain.cluster_groups.Cluster.execute_epoch().

Parameters: off_nodes (List[NodeType]) – The subset of members who disconnected during the current epoch.
Return type: None

membership_maintenance()[source]

Attempts to recruits new network nodes to be members of the cluster.

The method updates both members and _members_view.

Returns: A dictionary that is empty if membership did not change.
Return type: NodeDict

nodes_execute()[source]

Queries all members to execute the epoch.

Overrides:: app.domain.cluster_groups.Cluster.nodes_execute()

Returns: A collection of members who disconnected during the current epoch. See app.domain.network_nodes.HDFSNode.update_status().
Return type: List[NodeType]

class NewscastCluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.Cluster

Represents a P2P network of nodes performing mean degree aggregation, while simultaneously using Newscast for view shuffling.

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters

master (MasterType) – A reference to an Master object that manages the Cluster being initialized.
file_name (str) – The name of the file the Cluster is responsible for persisting.
members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.
sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

_setup_epoch(epoch)[source]

Initializes some attributes cluster attributes at the start of an epoch.

Extends:: app.domain.cluster_groups.Cluster._setup_epoch()

Parameters: epoch (int) – The simulation’s current epoch.
Return type: None

evaluate()[source]

Prints the epoch’s aggregated peer degree, to the command-line interface.

Return type: None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters: epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.
Returns: False if Cluster failed to persist the file it was responsible for, otherwise True.
Return type: None

log_aggregation(value)[source]

Parameters: value (float) –

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:

app.domain.cluster_groups.Cluster.nodes_execute().

Note:: NewscasterCluster.nodes_execute always returns None.

Returns: A collection of members who disconnected during the current epoch. See app.domain.network_nodes.NewscastNode.update_status().
Return type: List[NodeType]

spread_files(replicas, strat='o')[source]

Distributes a collection of file block replicas among the members of the cluster group.

Overrides:: app.dommain.cluster_groups.Cluster.spread_files()

Parameters

replicas (ReplicasDict) – The FileBlockData replicas, without replication.
strat (str) –
Defines how replicas will be initially distributed in the Cluster. Unless overridden in children of this class the received value of strat will be ignored and will always be set to the default value o.

o
This strategy assumes erasure-coding is being used and that each network node will have no more than one encoded block, i.e., replication level is always equal to one. Note however, that if there are more encoded blocks than there are network nodes, some of these nodes might end up possessing an excessive amount of blocks.

Return type

None

wire_k_out()[source]

Creates a random directed P2P topology.

The initial cache size of each network node, is at most as big as NEWSCAST_CACHE_SIZE.

Note

The topology does not have self loops, because add_neighbor() does not accept node self addition to view. In rare occasions, the selected node out-going edges might all be invalid, this should be a non-issue, as the nodes will eventually join the overaly throughout the simulation.

class SGCluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.Cluster

Represents a group of network nodes persisting a file using swarm guidance algorithm.

v_

Density distribution cluster members must achieve with independent realizations for ideal persistence of the file.

Type: DataFrame

cv_

Tracks the file current density distribution, updated at each epoch.

Type: DataFrame

avg_

Tracks the file average density distribution. Used to assert if throughout the life time of a cluster, the desired density distribution v_ was achieved on average. Differs from cv_ because cv_ is used for instantaneous convergence comparison.

Type: DataFrame

_timer

Used as a logical clock to divide the entries of avg_ when a topology changes.

Type: int

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters

master (MasterType) – A reference to an Master object that manages the Cluster being initialized.
file_name (str) – The name of the file the Cluster is responsible for persisting.
members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.
sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

_log_evaluation(pcount, ptotal=- 1)[source]

Helper that collects Cluster data and registers it on a logger object.

Parameters

plive – The number of existing parts in the cluster at the simulation’s current epoch at online or suspect nodes.
ptotal (int) – The number of existing parts in the cluster at the simulation’s current epoch. This parameter is optional and may be used or not depending on the intent of the system. As a rule of thumb plive tracks the number of parts that are alive in the system for logging purposes, where as ptotal is used for comparisons and averages, e.g., SGCluster evaluate.
pcount (int) –

Return type

None

_normalize_avg_()[source]

_pretty_print_eq_distr_table(target, rtol, atol)[source]

Pretty prints a PSQL formatted table for visual vector comparison.

Parameters

target (DataFrame) – The DataFrame object to be formatted as PSQL table.
atol (float) – The allowed absolute tolerance.
rtol (float) – The allowed relative tolerance.

Return type

Any

_validate_transition_matrix(m, v_)[source]

Asserts if m is a Markov Matrix.

Verification is done by raising the m to the power of 4096 (just a large number) and checking if all columns of the powered matrix are element-wise equal to the entries of target_distribution.

Parameters

m (DataFrame) – The matrix to be verified.
v_ (DataFrame) – The steady state the m is expected to have.

Returns

True if the matrix converges to the target_distribution, otherwise False. I.e., if m is a markov matrix.

Return type

bool

add_cloud_reference()[source]

Adds a cloud server to the members of the SGCluster.

This method is used when SGCluster membership size becomes compromised and a backup solution using cloud approaches is desired. The idea is that surviving members upload their replicas to the cloud server, e.g., an Amazon S3 instance. See Master method get_cloud_reference() for more details.

Note

This method is virtual.

Return type: None

broadcast_transition_matrix(m)[source]

Slices a matrix and delivers columns to the respective network nodes.

Parameters: m (DataFrame) – A matrix to be broadcasted to the network nodes belonging who are currently members of the Cluster instance.
Return type: None

Note

An optimization could be made that configures a transition matrix for the cluster, independent of of file names, i.e., turn cluster groups into groups persisting multiple files instead of only one, thus reducing simulation spaceoverheads and in real-life scenarios, decreasing the load done to metadata servers, through queries and matrix calculations. For simplicity of implementation each cluster only manages one file.

create_and_bcast_new_transition_matrix()[source]

Helper method that attempts to generate a markov matrix to be sliced and distributed to the SGCluster members.

At most three transition matrices will be generated. The first to be successfully validated is distributed to the network nodes. If all matrices are invalid, the last matrix will be used to prevent infinite loops in the simulation. This is not an issue as eventually the membership of the SGCluster will change, thus, more opportunities to perform a correct swarm guidance behavior will be possible.

Return type: None

equal_distributions()[source]

Asserts if the desired distribution and current distribution are equal.

Equalility is calculated using numpy allclose function which has the following formula:

absolute(`a` - `b`) <= (`atol` + `rtol` * absolute(`b`))

Returns: True if distributions are close enough to be considered equal, otherwise, it returns False.
Return type: bool

evaluate()[source]

Evaluates and logs the health, possibly other parameters, of the Cluster at every epoch.

Return type: None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters: epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.
Returns: False if Cluster failed to persist the file it was responsible for, otherwise True.
Return type: None

maintain(off_nodes)[source]

Evicts any node who is referenced in off_nodes list.

Extends:: app.domain.cluster_groups.Cluster.maintain().

Parameters: off_nodes (List[NodeType]) – The subset of members who disconnected during the current epoch.
Return type: None

membership_maintenance()[source]

Attempts to recruits new network nodes to be members of the cluster.

The method updates both members and _members_view.

Extends:

app.domain.cluster_groups.Cluster.membership_maintenance().

SGCluster.membership_maintenance adds and removes cloud references depending depending on the length of members before maintenance is performed.

Returns: A dictionary that is empty if membership did not change.
Return type: NodeDict

new_desired_distribution(member_ids, member_uptimes)[source]

Sets a new desired distribution for the SGCluster.

Received member_uptimes are normalized to create a stochastic representation of the desired distribution, which can be used by the different transition matrix generation strategies.

Parameters

member_ids (List[str]) – A list of node identifiers who are members of the SGCluster.
member_uptimes (List[float]) – A list of node identifiers.

Return type

List[float]

Note

member_ids and member_uptimes elements at each index should belong to each other, i.e., they should originate from from the same network node.

Returns

A list of floats with normalized uptimes which represent the “reliability” of network nodes.

Parameters

member_ids (List[str]) –
member_uptimes (List[float]) –

Return type

List[float]

new_transition_matrix()[source]

Creates a new transition matrix that is likely to be a Markov Matrix.

Returns: The labeled matrix that has the fastests mixing rate from all the pondered strategies.
Return type: DataFrame

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:: app.domain.cluster_groups.Cluster.nodes_execute().

Returns: A collection of members who disconnected during the current epoch. See app.domain.network_nodes.Node.update_status().
Return type: List[NodeType]

remove_cloud_reference()[source]

Remove cloud references and delete files within it

Note

This method is virtual.

Return type: None

select_fastest_topology(a, v_)[source]

Creates multiple transition matrices and selects the fastest.

The fastest of the created transition matrices corresponds to the one with a faster mixing rate.

Parameters

a (ndarray) – An adjacency matrix that represents the network topology.
v_ (ndarray) – A desired distribution vector that defines the returned matrix steady state property.

Returns

A transition matrix that is likely to be a markov matrix whose steady state is v_, but is not yet validated. See _validate_transition_matrix().

Return type

ndarray

spread_files(replicas, strat='i')[source]

Distributes a collection of FileBlockData objects among the members of the SGCluster.

Overrides:: app.domain.cluster_groups.Cluster.spread_files().

Parameters

replicas (ReplicasDict) – The FileBlockData replicas, without replication.
strat (str) –
Defines how replicas will be initially distributed in the Cluster.

u
Each file block replica in replicas is distributed following a uniform probability vector among members of the cluster group.

a
Each file block replica in replicas is given up to N different members where N is equal to REPLICATION_LEVEL.

i
Each file block replica in replicas with bias towards the ideal steady state distribution. This implementation of differs from app.domain.cluster_groups.Cluster.spread_files(), because it is not necessarely based on node uptime.

Return type

None

class SGClusterExt(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.SGCluster

Represents a group of network nodes persisting a file.

SGClusterExt instances differ from SGCluster because their members are of type SGNodeExt. When combined these classes give nodes the responsibility of collaborating in the detection of faulty members of the SGClusterExt and eventually kicking them out of the group.

complaint_threshold

Reference value that defines the maximum number of complaints a network node can receive before it is evicted from the SGClusterExt.

Type: int

nodes_complaints

A dictionary mapping network node identifiers' to the number of complaints made against them by other members. When complaints becomes bigger than py:py:attr:complaint_threshold the complaintee is evicted from the group.

Type: Dict[str, int]

suspicious_nodes

A dictionary containing the unique node identifiers of known suspicious members and how many epochs have passed since they changed to such status.

Type: Dict[str, int]

_epoch_complaints

A set of unique identifiers formed from the concatenation of node identifiers, to avoid multiple complaint registrations on the same epoch, done by the same source towards the same target. The set is reset every epoch.

Type: set

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters

master (MasterType) – A reference to an Master object that manages the Cluster being initialized.
file_name (str) – The name of the file the Cluster is responsible for persisting.
members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.
sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

complain(complainter, complainee, reason)[source]

Registers a complaint against a possibly offline node.

A unique identifier for the complaint is generated by concatenation of the complainter and the complainee unique identifiers.

Overrides:: app.domain.cluster_groups.Cluster.complain()

Parameters

complainter (str) – The identifier of the complaining SGNodeExt.
complainee (str) – The identifier of the SGNodeExt being complained about.
reason (app.type_hints.HttpResponse) – The http code that led to the complaint.

Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters: epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.
Returns: False if Cluster failed to persist the file it was responsible for, otherwise True.
Return type: None

maintain(off_nodes)[source]

Evicts any network node who has been complained about more than complaint_threshold times.

Overrides:: app.domain.cluster_groups.Cluster.maintain().

Parameters: off_nodes (List[NodeType]) – The subset of members who disconnected during the current epoch.
Return type: None

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:

app.domain.cluster_groups.SGCluster.nodes_execute().

Offline network nodes are considered suspects until enough complaints from other SGNodeExt members are received. This is important because lost parts can not be logged multiple times. Yet suspected network nodes need to be contabilized as offline for simulation purposes without being evicted from the group until they are detected by their peers as being offline.

Returns: A collection of members who disconnected during the current epoch. See app.domain.network_nodes.SGNodeExt.update_status().
Return type: List[NodeType]

class SGClusterPerfect(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.SGCluster

Represents a group of network nodes persisting a file using swarm guidance algorithm.

This implementation assumes nodes never disconnect, there are no disk errors and there is no link loss, i.e., it is used to study properties of the system independently of computing environment.

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters

master (MasterType) – A reference to an Master object that manages the Cluster being initialized.
file_name (str) – The name of the file the Cluster is responsible for persisting.
members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.
sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters: epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.
Returns: False if Cluster failed to persist the file it was responsible for, otherwise True.
Return type: None

new_transition_matrix()[source]

Creates a new transition matrix that is likely to be a Markov Matrix.

Returns: The labeled matrix that has the fastests mixing rate from all the pondered strategies.
Return type: DataFrame

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:: app.domain.cluster_groups.Cluster.nodes_execute().

Returns: A collection of members who disconnected during the current epoch. See app.domain.network_nodes.Node.update_status().
Return type: List[NodeType]

select_fastest_topology(a, v_)[source]

Creates multiple transition matrices and selects the fastest.

The fastest of the created transition matrices corresponds to the one with a faster mixing rate.

Parameters

a (ndarray) – An adjacency matrix that represents the network topology.
v_ (ndarray) – A desired distribution vector that defines the returned matrix steady state property.

Returns

A transition matrix that is likely to be a markov matrix whose steady state is v_, but is not yet validated. See _validate_transition_matrix().

Return type

ndarray

app.domain.master_servers

This module contains domain specific classes that coordinate all app.domain.cluster_groups of a simulation instance. These could simulate centralized authentication servers, file localization or file metadata servers or a bank of currently online and offline storage nodes.

class HDFSMaster(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: app.domain.master_servers.Master

_process_simfile(path, cluster_class, node_class)[source]

Opens and processes the simulation filed referenced in path.

Overrides:

app.domain.master_servers.Master._process_simfile().

The method is exactly the same except for one instruction. The _split_files() is invoked with fixed bsize = 1MB. The reason for this is two-fold:

The default and, thus recommended, block size for the hadoop distributed file system is 128MB. The system is not designed to perform well with small file blocks, but SG requires many file blocks to work, hence being more effective with small block sizes.

Hadoop limits the minimum block size to be 1MB, dfs.namenode.fs-limits.min-block-size. For this reason, we make HDFSMaster split files into 1MB chunks, as that is the closest we would get to our Hive’s default block size in the real world.

The other difference is that the spread strategy is ignored. We are not interested in knowing if the way the files are initially spread affects the time it takes for clusters to achieve a steady-state distribution since in HDFS file block replicas are stationary on data nodes until they die.

Parameters

path (str) – The path to the simulation file. Including extension and parent folders.
cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See app.domain.cluster_groups.
node_class (str) – The name of the class used to instantiate network node instances through reflection. See app.domain.network_nodes.

Return type

None

class Master(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: object

Simulation manager class, some kind of puppet-master. Could represent an authentication server or a monitor that decides along with other Master entities what network nodes are online using consensus algorithms.

origin

The name of the simulation file name that started the simulation process.

Type: str

sid

Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

Type: int

epoch

The simulation’s current epoch.

Type: int

cluster_groups

A collection of cluster groups managed by the Master. Keys are cluster identifiers and values are the cluster instances.

Type: app.type_hints.ClusterDict

network_nodes

A dictionary mapping node identifiers to their instance objects. This collection differs from app.domain.cluster_groups.Cluster.members attribute in the sense that the former network_nodes includes all nodes, both online and offline, available on the entire distributed backup storage system regardless of their participation in any cluster group.

Type: app.type_hints.NodeDict

__init__(simfile_name, sid, epochs, cluster_class, node_class)[source]

Instantiates an Master object.

Parameters

simfile_name (str) – A path to the simulation file to be run by the simulator.
sid (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
epochs (int) – The number of discrete time steps the simulation lasts.
cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See cluster groups module.
node_class (str) – The name of the class used to instantiate network node instances through reflection. See network nodes module.

Return type

None

_create_network_nodes(json, node_class)[source]

Helper method that instantiates all network nodes that are specified in the simulation file.

Parameters

json (Dict[str, Any]) – The simulation file in JSON dictionary object format.
node_class (str) – The type of network node to create.

Return type

None

_new_cluster_group(cluster_class, size, fname)[source]

Helper method that initializes a new Cluster group.

Parameters

cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See cluster groups module.
size (int) – The cluster's initial memberhip size.
fname (str) – The name of the fille being stored in the cluster.

Returns

The Cluster instance.

Return type

ClusterType

_new_network_node(node_class, nid, node_uptime)[source]

Helper method that initializes a new Node.

Parameters

node_class (str) – The name of the class used to instantiate network node instances through reflection. See network nodes module.
nid (str) – An id that will uniquely identifies the network node.
node_uptime (str) – A float value in string representation that defines the uptime of the network node.

Returns

The Node instance.

Return type

NodeType

_process_simfile(path, cluster_class, node_class)[source]

Opens and processes the simulation filed referenced in path.

This method opens the file reads the json data inside it. Combined with app.environment_settings it sets up the class instances to be used during the simulation (e.g., cluster groups and network nodes). This method also be splits the file to be persisted in the simulation into multiple blocks or chunks and for triggering the initial file spreading mechanism.

Parameters

path (str) – The path to the simulation file. Including extension and parent folders.
cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See app.domain.cluster_groups.
node_class (str) – The name of the class used to instantiate network node instances through reflection. See app.domain.network_nodes.

Return type

None

_split_files(fname, cluster, bsize)[source]

Helper method that splits the files into multiple blocks to be persisted in a cluster group.

Parameters

fname (str) – The name of the file located in SHARED_ROOT folder to be read and splitted.
cluster (ClusterType) – A reference to the cluster group whose members will be responsible for ensuring the file specified in fname becomes durable.
bsize (int) – The maximum amount of bytes each file block can have.

Returns

A dictionary in which the keys are integers and values are file blocks, whose attribute number is the key.

Return type

ReplicasDict

execute_simulation()[source]

Starts the simulation processes.

Return type: None

find_online_nodes(n=1, blacklist=None)[source]

Finds n network nodes who are currently registered at the Master and whose status is online.

Parameters

n (int) – How many network node references the requesting entity wants to find.
blacklist (NodeDict) – A collection of nodes identifiers and their object instances, which specify nodes the requesting entity has no interest in.

Returns

A collection of network nodes which is at most as big as n, which does not include any node named in blacklist.

Return type

NodeDict

MAX_EPOCHS: Optional[int] = None

MAX_EPOCHS_PLUS_ONE: Optional[int] = None

class NewscastMaster(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: app.domain.master_servers.Master

__init__(simfile_name, sid, epochs, cluster_class, node_class)[source]

Instantiates an Master object.

Parameters

simfile_name (str) – A path to the simulation file to be run by the simulator.
sid (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.
epochs (int) – The number of discrete time steps the simulation lasts.
cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See cluster groups module.
node_class (str) – The name of the class used to instantiate network node instances through reflection. See network nodes module.

Return type

None

_process_simfile(path, cluster_class, node_class)[source]

Opens and processes the simulation filed referenced in path.

Overrides:

app.domain.master_servers.Master._process_simfile().

Newscast is a gossip-based P2P network. We assume erasure-coding would be used in this scenario and thus, for simplicity, we divide the specified file’s size into multiple 1/N, where N is the number of network nodes in the system.

Note

This class, NewscastCluster and NewscastNode were created to test our simulators performance, concerning the amount of supported simultaneous network nodes in a simulation. We do not actually care if the created file blocks are lost as the network nodes job in the simulation is to carry out the protocol defined in PeerSim’s AverageFunction. PeerSim uses configuration Example 2 provided in release 1.0.5, as a means of testing the simulator performance, according to this Ms.C. dissertation by J. Neto. This configuration uses Newscast protocol with AverageFunction and periodic monitoring of the system state. We implement our version of Adaptaive Peer Sampling with Newscast by N. Tölgyesi and M. Jelasity, to avoid the effort of translating PeerSim’s code.

Parameters

path (str) – The path to the simulation file. Including extension and parent folders.
cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See app.domain.cluster_groups.
node_class (str) – The name of the class used to instantiate network node instances through reflection. See app.domain.network_nodes.

Return type

None

class SGMaster(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: app.domain.master_servers.Master

get_cloud_reference()[source]

Use to obtain a reference to 3rd party cloud storage provider

The cloud storage provider can be used to temporarely host files belonging to cluster clusters in bad conditions that may compromise the file durability of the files they are responsible for persisting.

Note

This method is virtual.

Returns: A pointer to the cloud server, e.g., an IP Address.
Return type: str

_PersistentingDict: Dict[str, Dict[str, Union[List[str], str]]]

app.domain.network_nodes

This module contains domain specific classes that represent network nodes responsible for the storage of file blocks. These could be reliable servers or P2P nodes.

class HDFSNode(uid, uptime)[source]

Bases: app.domain.network_nodes.Node

Represents a data node in the Hadoop Distribute File System.

execute_epoch(cluster, fid)[source]

Instructs the HDFSNode instance to execute the epoch.

The method iterates files held in disk and attempts to corrupt them silently. In HDFS file blocks’ sha256 are only verified when a user or client accesses the remote replica. Hence, no replication epoch is set up when a corruption occurs. The corruption is still logged in the output file.

Overrides:: app.domain.network_nodes.Node.execute_epoch().

Parameters

cluster (ClusterType) – A reference to the Cluster that invoked the Node method.
fid (str) – The file name identifier of the file being simulated.

Return type

None

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Replicas are sent selectively in descending order to the most reliable Nodes in the Cluster down to the least reliable.

Overrides:: app.domain.network_nodes.Node.replicate_part().

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters

cluster (ClusterType) – A reference to the Cluster that will deliver the new replica.
replica (FileBlockData) – The file block replica to be delivered.

Return type

None

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Overrides:: app.domain.network_nodes.Node.update_status().

Returns: The the status of the Node.
Return type: Status

class NewscastNode(uid, uptime)[source]

Bases: app.domain.network_nodes.Node

Represents a Peer running Newscast protocol, using shuffling techniques to exchange acquaintances with other network peers and performing peer degree aggregation using AverageFunction.

view: A partial view of the P2P network. View is a collection of network nodes, the NewscastNode instance may contact other than himself. Keys are NewscastNode instances, and values are their age in the dictionary. A key-value pair is commonly referenced as a descriptor.

aggregation_value: Stores the aggregation value. The type of aggregation_value is defined by the body of the aggregate() method.

__init__(uid, uptime)[source]

Instantiates a Node object.

These are network nodes responsible for persisting file block replicas.

Parameters

uid (str) – An unique identifier for the Node instance.
uptime (float) – The availability of the Node instance.

Return type

None

_merge(a, b)[source]

Merges two network views. If a node descriptor exists in both views, the most recent descriptor is kept.

Parameters

a (_NetworkView) – A dictionary where keys are network nodes and values are their respective age in the view.
b (_NetworkView) – A dictionary where keys are network nodes and values are their respective age in the view.

Returns

The set union of both views with only the most up to date descriptors.

Return type

_NetworkView

_select_view(view_buffer)[source]

Reduces the size of the view to a predefined maximum size.

:param A dictionary where keys are network nodes: :param and values are their respective age in the view.:

Returns: The view_buffer with at most max_view_size descriptors.
Parameters: view_buffer (_NetworkView) –
Return type: _NetworkView

add_neighbor(node)[source]

Adds a new network node to the node instance’s view.

If the view is full, the eldest node is replaced with the new node. Otherwise, the new NewscastNode is added to the instance’s view with age zero, unless the entry is already in view or the node is the current NewscastNode instance.

Returns: True if node was successfuly added, False otherwise.
Parameters: node (app.domain.network_nodes.NewscastNode) –
Return type: bool

aggregate(node=None)[source]

The network node instance contacts another node from his view, then, both nodes assign the mean of their degrees to aggregation_value.

Parameters: node (Optional[app.domain.network_nodes.NewscastNode]) – When node is None a random NewscastNode is selected from view. When specified to be contacted is the one referenced in the parameter.
Return type: None

execute_epoch(cluster, fid)[source]

Instructs the NewscastNode instance to execute the epoch.

During the execution of the epoch, the NewscastNode instance randomly selects another NewscastNode who belongs to his view and aggregates their degree using the Average Function. Sometimes, during the epoch, the NewscastNode instance will also perform shuffling with the selected target.

Overrides:: app.domain.network_nodes.Node.execute_epoch().

Parameters

cluster (ClusterType) – A reference to the Cluster that invoked the Node method.
fid (str) – The file name identifier of the file being simulated.

Return type

None

get_degree()[source]

Counts the number of descriptors in the node’s view.

Returns: The degree of the NewscastNode instance.
Return type: int

get_node()[source]

Gets a random node from the current network view.

Each candidate NewscastNode to be returned is first pinged, if no answer is obtained, another node is selected as a candidate by iterating a list representation of view and the previous candidate is removed from the view.

Note

Newscast should always return a random node, thus iteration should not be used, but this search is more efficient and readable.

Returns: The selected NewscastNode.
Return type: Optional[app.domain.network_nodes.NewscastNode]

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Similar to send_part() but with slightly different instructions. In particular new replicas can not be corrupted at the current node, at the current epoch.

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters

cluster (ClusterType) – A reference to the Cluster that will deliver the new replica.
replica (FileBlockData) – The file block replica to be delivered.

Raises

NotImplementedError – When children of Node do not implement the abstract method.

Return type

None

shuffle(node)[source]

Starts a shuffle process that merges and crops two nodes’ views at the current node and at the destination node.

The final view consists of most up to date descriptors from both views up to a maximum of max_view_size descriptors.

Parameters: node (app.domain.network_nodes.NewscastNode) – The node to be contacted for shuffling.
Return type: None

shuffle_request(senders_view)[source]

Merges and crops two nodes’ views at the current node.

The final view consists of most up to date descriptors from both views up to a maximum of max_view_size descriptors.

Parameters: senders_view (_NetworkView) – A dictionary where keys are network nodes and values are their respective age in the view.
Returns: A view and a fresh descriptor from the NewscastNode instance, before it is merged with the requestor’s view.
Return type: _NetworkView

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Overrides:: app.domain.network_nodes.Node.update_status().

Returns: The the status of the Node.
Return type: Status

class Node(uid, uptime)[source]

Bases: object

This class contains basic network node functionality that should always be useful.

id

A unique identifier for the Node instance.

Type: str

uptime

The amount of time the Node is expected to remain online without disconnecting. Current uptime implementation is based on availability percentages.

Note

Current implementation expects network nodes joining a cluster group to remain online for approximately:

time_to_live = uptime * MAX_EPOCHS.

However, a network node who belongs to multiple cluster groups may disconnect earlier than that, i.e., network nodes remain online time_to_live after their first operation on the distributed backup system.

Type: float

status

Indicates if the Node instance is online or offline. In later releases this could also contain a ‘suspect’ status.

Type: app.domain.helpers.enums.Status

suspicious_replies

Collection that contains http codes that when received, trigger complaints to monitors about the replier.

Type: set

files

A dictionary mapping file names to dictionaries of file block identifiers and their respective contents, i.e., the file block replicas hosted at the Node.

Type: Dict[str, ReplicasDict]

__init__(uid, uptime)[source]

Instantiates a Node object.

These are network nodes responsible for persisting file block replicas.

Parameters

uid (str) – An unique identifier for the Node instance.
uptime (float) – The availability of the Node instance.

Return type

None

discard_part(fid, number, corrupt=False, cluster=None)[source]

Safely deletes a part from the SGNode instance’s disk.

Parameters

fid (str) – Name of the file the file block replica belongs to.
number (int) – The part number that uniquely identifies the file block.
corrupt (bool) – If discard is being invoked due to identified file block corruption, e.g., Sha256 does not match the expected.
cluster (ClusterType) – Cluster that will set the replication epoch or mark the simulation as failed.

Return type

None

execute_epoch(cluster, fid)[source]

Instructs the Node instance to execute the epoch.

Parameters

cluster (ClusterType) – A reference to the Cluster that invoked the Node method.
fid (str) – The file name identifier of the file being simulated.

Raises

NotImplementedError – When children of Node do not implement the abstract method.

Return type

None

get_file_parts(fid)[source]

Gets collection of file parts that correspond to the named file.

Parameters: fid (str) – The file name identifier that designates the file block replicas to be retrieved.
Returns: A dictionary where keys are file block numbers and values are file block replicas
Return type: ReplicasDict

get_file_parts_count(fid)[source]

Counts the number of file block replicas of a specific file owned by the Node.

Parameters: fid (str) – The file name identifier that designates the file block replicas to be counted.
Returns: The number of counted replicas.
Return type: int

is_suspect()[source]

Returns True if the node is behaving suspiciously, else False.

Return type: bool

is_up()[source]

Returns True if the node is online, else False.

Return type: bool

receive_part(replica)[source]

Endpoint for file block replica reception.

The Node stores a new file block replica in files if he does not have a replica with same identifier.

Parameters: replica (domain.helpers.smart_dataclasses.FileBlockData) – The file block replica to be received by Node.
Returns: If upon integrity verification the sha256 hashvalue differs from the expected, the worker replies with a BAD_REQUEST. If the Node already owns a replica with the same identifier it replies with NOT_ACCEPTABLE. Otherwise it replies with a OK, i.e., the delivery is successful.
Return type: HttpCodes

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Similar to send_part() but with slightly different instructions. In particular new replicas can not be corrupted at the current node, at the current epoch.

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters

cluster (ClusterType) – A reference to the Cluster that will deliver the new replica.
replica (FileBlockData) – The file block replica to be delivered.

Raises

NotImplementedError – When children of Node do not implement the abstract method.

Return type

None

send_part(cluster, destination, replica)[source]

Attempts to send a replica to some other network node.

Parameters

cluster (ClusterType) – A reference to the Cluster that will deliver the new replica. In a real world implementation this argument would not make sense, but we use it to facilitate simulation management and environment logging.
destination (str) – The name, address or another unique identifier of the node that will receive the file block replica.
replica (FileBlockData) – The file block container to be sent to some other worker.

Returns

An http code.

Return type

HttpCodes

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Returns: The the status of the Node.
Return type: Status

class SGNode(uid, uptime)[source]

Bases: app.domain.network_nodes.Node

Represents a network node that executes a Swarm Guidance algorithm.

clusters: A collection of cluster groups the SGNode is a member of.

routing_table

Contains the information required to appropriately route file block blocks to other SGNode instances.

Type: Dict[str, DataFrame]

__init__(uid, uptime)[source]

Instantiates a Node object.

These are network nodes responsible for persisting file block replicas.

Parameters

uid (str) – An unique identifier for the Node instance.
uptime (float) – The availability of the Node instance.

Return type

None

execute_epoch(cluster, fid)[source]

Instructs the Node instance to execute the epoch.

The method iterates all file block blocks in files and independently decides if they should be sent to another SGNode by following the probabilities in routing_table column vectors.

Overrides:: app.domain.network_nodes.Node.execute_epoch().

Parameters

cluster (ClusterType) – A reference to the Cluster that invoked the Node method.
fid (str) – The file name identifier of the file being simulated.

Return type

None

remove_file_routing(fid)[source]

Removes a file name from the SGNode routing table.

This method is called when a SGNode is evicted from the cluster group and results in the deletion from disk of all file block replicas with identifier fid.

Parameters: fid (str) – The file name identifier of the file whose routing is being eliminated.
Return type: None

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Similar to send_part() but with slightly different instructions. In particular new replicas can not be corrupted at the current node, at the current epoch. The replicas are also sent selectively in descending order to the most reliable Nodes in the Cluster down to the least reliable. Whereas send_part(). follows stochastic swarm guidance routing.

Overrides:: app.domain.network_nodes.Node.replicate_part().

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters

cluster (ClusterType) – A reference to the Cluster that will deliver the new replica.
replica (FileBlockData) – The file block replica to be delivered.

Return type

None

select_destination(fid)[source]

Selects a random message destination according to routing_table probabilities for the specified file name.

Parameters: fid (str) – The file name identifier to obtain the proper routing_table for destination selection.
Returns: The name or address of the selected destination.
Return type: str

set_file_routing(fid, v_)[source]

Maps a file name identifier with a transition column vector used for file block replica routing.

Parameters

fid (str) – The file name identifier of the file whose routing is being configured.
v_ (Union[Series, DataFrame]) – A column vector with probabilities that dictate the odds of sending file block blocks belonging to the file with specified id to other Cluster members also working on the persistence of the file block blocks.

Raises

ValueError – If transition_vector is not a DataFrame and cannot be casted to it.

Return type

None

class SGNodeExt(uid, uptime)[source]

Bases: app.domain.network_nodes.SGNode

Represents a network node that executes a Swarm Guidance algorithm.

SGNodeExt instances differ from SGNode in the sense that the latter does not monitor the peers belonging to his cluster groups, concerning their connectivity status or suspicious behaviours.

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Overrides:: app.domain.network_nodes.Node.update_status().

Returns: The the status of the Node.
Return type: Status

_NetworkView: Dict[Union[str, app.domain.network_nodes.Node], int]