app.domain

Submodules

app.domain.cluster_groups

This module contains domain specific classes that represent groups of storage nodes.

class Cluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: object

Represents a group of network nodes ensuring the durability of a file.

id

A unique identifier of the Cluster instance.

Type

str

current_epoch

The simulation’s current epoch.

Type

int

corruption_chances

A two-element list containing the probability of FileBlockData being corrupted and not being corrupted, respectively. See get_disk_error_chances() for corruption chance configuration.

Type

List[float]

master

A reference to a server that coordinates or monitors the Cluster.

Type

Master

members

A collection of network nodes that belong to the Cluster.

Type

NodeDict

_members_view

A list representation of the nodes in members.

Type

List[NodeType]

file

A reference to FileData object that represents the file being persisted by the Cluster instance.

Type

FileData

critical_size

Minimum number of network nodes plus required to exist in the Cluster to assure the target replication level.

Type

int

sufficient_size

Sum of critical_size and the number of nodes expected to fail between two successive recovery phases.

Type

int

original_size

The initial and theoretically optimal Cluster size.

Type

int

redundant_size

Application-specific parameter, which indicates that membership of the Cluster must be pruned.

Type

int

running

Indicates if the Cluster instance is active. Used by Master to manage the simulation processes.

Type

bool

_membership_changed

Flag indicates wether or not _members_view needs to be updated during membership_maintenance(). The variable is set to false at the beggining of every epoch and set to true if the length of off_nodes list return by nodes_execute() is bigger than zero.

Type

bool

_recovery_epoch_sum

Helper attribute that facilitates the storage of the sum of the values returned by all set_recovery_epoch() method calls. Important for logging purposes.

Type

int

_recovery_epoch_calls

Helper attribute that facilitates the storage of the sum of the values returned by all set_recovery_epoch() method calls throughout the current_epoch.

Type

int

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters
  • master (MasterType) – A reference to an Master object that manages the Cluster being initialized.

  • file_name (str) – The name of the file the Cluster is responsible for persisting.

  • members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.

  • sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

_get_new_members()[source]

Helper method that searches for possible network node by querying the master of the Cluster.

Returns

A dictionary mapping where keys are node identifiers and values are node instances.

Return type

NodeDict

_log_evaluation(plive, ptotal=- 1)[source]

Helper that collects Cluster data and registers it on a logger object.

Parameters
  • plive (int) – The number of existing parts in the cluster at the simulation’s current epoch at online or suspect nodes.

  • ptotal (int) – The number of existing parts in the cluster at the simulation’s current epoch. This parameter is optional and may be used or not depending on the intent of the system. As a rule of thumb plive tracks the number of parts that are alive in the system for logging purposes, where as ptotal is used for comparisons and averages, e.g., SGCluster evaluate.

Return type

None

_set_fail(message)[source]

Ends the Cluster instance simulation.

Sets running to False and orders FileData to write collected logs to disk and close it’s out_file stream.

Parameters

message (str) – A short explanation of why the Cluster terminated early.

Return type

None

_setup_epoch(epoch)[source]

Initializes some attributes cluster attributes at the start of an epoch.

This method also forces all of the Clusters members to update their connectivity status before any node is instructed to execute.

Parameters

epoch (int) – The simulation’s current epoch.

Return type

None

complain(complainter, complainee, reason)[source]

Registers a complaint against a possibly offline node.

Note

This method provides no default functionality and should be overridden in sub classes if required.

Parameters
Return type

None

evaluate()[source]

Evaluates and logs the health, possibly other parameters, of the Cluster at every epoch.

Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters

epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.

Returns

False if Cluster failed to persist the file it was responsible for, otherwise True.

Return type

None

get_cluster_status()[source]

Determines the Cluster’s status based on the length of the current members list.

Returns

The status of the Cluster as a string.

Return type

str

get_node()[source]

Retrives a random node from the members of the cluster group, whose status is likely to be online.

Returns

A random network node from members.

Return type

NodeType

maintain(off_nodes)[source]

Offers basic maintenance functionality for Cluster types.

If off_nodes list param as at least one node reference, _membership_changed is set to True.

Parameters

off_nodes (List[th.NodeType]) – A possibly empty of offline nodes.

Return type

None

membership_maintenance()[source]

Attempts to recruits new network nodes to be members of the cluster.

The method updates both members and _members_view.

Returns

A dictionary that is empty if membership did not change.

Return type

NodeDict

nodes_execute()[source]

Queries all members to execute the epoch.

This method logs the amount of lost replicas throughout current_epoch according to the members who went offline and the FileBlockData replicas they posssed and is responsible for setting a replication epoch. Similarly it logs the number of members who disconnected.

Returns

List of members that disconnected during the current_epoch. See app.domain.network_nodes.Node.update_status().

Return type

List[NodeType]

route_part(sender, receiver, replica, is_fresh=False)[source]

Sends a file block replica to some other network node in members.

Parameters
  • sender (str) – An identifier of the network node who is sending the message.

  • receiver (str) – The destination network node identifier.

  • replica (FileBlockData) – The file block replica to be sent specified destination: receiver.

  • is_fresh (bool) – Prevents recently created replicas from being corrupted, since they are not likely to be corrupted in disk. This argument facilitates simulation.

Returns

An http code sent by the receiver.

Return type

int

set_replication_epoch(replica)[source]

Delegates to set_replication_epoch().

Parameters

replica (domain.helpers.smart_dataclasses.FileBlockData) – The file block replica that was lost.

Return type

None

spread_files(replicas, strat='i')[source]

Distributes a collection of file block replicas among the members of the cluster group.

Parameters
  • replicas (ReplicasDict) – The FileBlockData replicas, without replication.

  • strat (str) –

    Defines how replicas will be initially distributed in the Cluster. Unless overridden in children of this class the received value of strat will be ignored and will always be set to the default value i.

    i

    This strategy creates a probability vector containing the normalization of network nodes uptimes' and uses that vector to randomly select which node will receive each replica. There is a bias to give more replicas to the most resillent nodes which results from using the created probability vector.

Return type

None

class HDFSCluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.Cluster

Represents a group of network nodes ensuring the durability of a file in a Hadoop Distributed File System scenario.

Note

Members of HDFSCluster are of type HDFSNode, they do not perform swarm guidance behaviors and instead report with regular heartbeats to their monitors. This class could be a NameNode Server in HDFS or a master server in GFS.

suspicious_nodes

A set containing the identifiers of suspicious network nodes.

Type

set

data_node_heartbeats

A dictionary mapping node identifiers to the number of complaints made against them. Each node has five lives. When they miss five beats in a row, i.e., when the dictionary value count is zero, they are evicted from the cluster.

Type

Dict[str, int]

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters
  • master (MasterType) – A reference to an Master object that manages the Cluster being initialized.

  • file_name (str) – The name of the file the Cluster is responsible for persisting.

  • members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.

  • sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

evaluate()[source]

Logs the number of existing replicas in the HDFSCluster.

Overrides:

app.domain.cluster_groups.Cluster.evaluate().

Return type

None

maintain(off_nodes)[source]

Evicts any network node whose heartbeats in data_node_heartbeats reached zero.

Extends:

app.domain.cluster_groups.Cluster.execute_epoch().

Parameters

off_nodes (List[NodeType]) – The subset of members who disconnected during the current epoch.

Return type

None

membership_maintenance()[source]

Attempts to recruits new network nodes to be members of the cluster.

The method updates both members and _members_view.

Returns

A dictionary that is empty if membership did not change.

Return type

NodeDict

nodes_execute()[source]

Queries all members to execute the epoch.

Overrides:

app.domain.cluster_groups.Cluster.nodes_execute()

Returns

A collection of members who disconnected during the current epoch. See app.domain.network_nodes.HDFSNode.update_status().

Return type

List[NodeType]

class NewscastCluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.Cluster

Represents a P2P network of nodes performing mean degree aggregation, while simultaneously using Newscast for view shuffling.

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters
  • master (MasterType) – A reference to an Master object that manages the Cluster being initialized.

  • file_name (str) – The name of the file the Cluster is responsible for persisting.

  • members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.

  • sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

_setup_epoch(epoch)[source]

Initializes some attributes cluster attributes at the start of an epoch.

Extends:

app.domain.cluster_groups.Cluster._setup_epoch()

Parameters

epoch (int) – The simulation’s current epoch.

Return type

None

evaluate()[source]

Prints the epoch’s aggregated peer degree, to the command-line interface.

Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters

epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.

Returns

False if Cluster failed to persist the file it was responsible for, otherwise True.

Return type

None

log_aggregation(value)[source]
Parameters

value (float) –

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:

app.domain.cluster_groups.Cluster.nodes_execute().

Note:

NewscasterCluster.nodes_execute always returns None.

Returns

A collection of members who disconnected during the current epoch. See app.domain.network_nodes.NewscastNode.update_status().

Return type

List[NodeType]

spread_files(replicas, strat='o')[source]

Distributes a collection of file block replicas among the members of the cluster group.

Overrides:

app.dommain.cluster_groups.Cluster.spread_files()

Parameters
  • replicas (ReplicasDict) – The FileBlockData replicas, without replication.

  • strat (str) –

    Defines how replicas will be initially distributed in the Cluster. Unless overridden in children of this class the received value of strat will be ignored and will always be set to the default value o.

    o

    This strategy assumes erasure-coding is being used and that each network node will have no more than one encoded block, i.e., replication level is always equal to one. Note however, that if there are more encoded blocks than there are network nodes, some of these nodes might end up possessing an excessive amount of blocks.

Return type

None

wire_k_out()[source]

Creates a random directed P2P topology.

The initial cache size of each network node, is at most as big as NEWSCAST_CACHE_SIZE.

Note

The topology does not have self loops, because add_neighbor() does not accept node self addition to view. In rare occasions, the selected node out-going edges might all be invalid, this should be a non-issue, as the nodes will eventually join the overaly throughout the simulation.

class SGCluster(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.Cluster

Represents a group of network nodes persisting a file using swarm guidance algorithm.

v_

Density distribution cluster members must achieve with independent realizations for ideal persistence of the file.

Type

DataFrame

cv_

Tracks the file current density distribution, updated at each epoch.

Type

DataFrame

avg_

Tracks the file average density distribution. Used to assert if throughout the life time of a cluster, the desired density distribution v_ was achieved on average. Differs from cv_ because cv_ is used for instantaneous convergence comparison.

Type

DataFrame

_timer

Used as a logical clock to divide the entries of avg_ when a topology changes.

Type

int

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters
  • master (MasterType) – A reference to an Master object that manages the Cluster being initialized.

  • file_name (str) – The name of the file the Cluster is responsible for persisting.

  • members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.

  • sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

_log_evaluation(pcount, ptotal=- 1)[source]

Helper that collects Cluster data and registers it on a logger object.

Parameters
  • plive – The number of existing parts in the cluster at the simulation’s current epoch at online or suspect nodes.

  • ptotal (int) – The number of existing parts in the cluster at the simulation’s current epoch. This parameter is optional and may be used or not depending on the intent of the system. As a rule of thumb plive tracks the number of parts that are alive in the system for logging purposes, where as ptotal is used for comparisons and averages, e.g., SGCluster evaluate.

  • pcount (int) –

Return type

None

_normalize_avg_()[source]
_pretty_print_eq_distr_table(target, rtol, atol)[source]

Pretty prints a PSQL formatted table for visual vector comparison.

Parameters
  • target (DataFrame) – The DataFrame object to be formatted as PSQL table.

  • atol (float) – The allowed absolute tolerance.

  • rtol (float) – The allowed relative tolerance.

Return type

Any

_validate_transition_matrix(m, v_)[source]

Asserts if m is a Markov Matrix.

Verification is done by raising the m to the power of 4096 (just a large number) and checking if all columns of the powered matrix are element-wise equal to the entries of target_distribution.

Parameters
  • m (DataFrame) – The matrix to be verified.

  • v_ (DataFrame) – The steady state the m is expected to have.

Returns

True if the matrix converges to the target_distribution, otherwise False. I.e., if m is a markov matrix.

Return type

bool

add_cloud_reference()[source]

Adds a cloud server to the members of the SGCluster.

This method is used when SGCluster membership size becomes compromised and a backup solution using cloud approaches is desired. The idea is that surviving members upload their replicas to the cloud server, e.g., an Amazon S3 instance. See Master method get_cloud_reference() for more details.

Note

This method is virtual.

Return type

None

broadcast_transition_matrix(m)[source]

Slices a matrix and delivers columns to the respective network nodes.

Parameters

m (DataFrame) – A matrix to be broadcasted to the network nodes belonging who are currently members of the Cluster instance.

Return type

None

Note

An optimization could be made that configures a transition matrix for the cluster, independent of of file names, i.e., turn cluster groups into groups persisting multiple files instead of only one, thus reducing simulation spaceoverheads and in real-life scenarios, decreasing the load done to metadata servers, through queries and matrix calculations. For simplicity of implementation each cluster only manages one file.

create_and_bcast_new_transition_matrix()[source]

Helper method that attempts to generate a markov matrix to be sliced and distributed to the SGCluster members.

At most three transition matrices will be generated. The first to be successfully validated is distributed to the network nodes. If all matrices are invalid, the last matrix will be used to prevent infinite loops in the simulation. This is not an issue as eventually the membership of the SGCluster will change, thus, more opportunities to perform a correct swarm guidance behavior will be possible.

Return type

None

equal_distributions()[source]

Asserts if the desired distribution and current distribution are equal.

Equalility is calculated using numpy allclose function which has the following formula:

absolute(`a` - `b`) <= (`atol` + `rtol` * absolute(`b`))
Returns

True if distributions are close enough to be considered equal, otherwise, it returns False.

Return type

bool

evaluate()[source]

Evaluates and logs the health, possibly other parameters, of the Cluster at every epoch.

Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters

epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.

Returns

False if Cluster failed to persist the file it was responsible for, otherwise True.

Return type

None

maintain(off_nodes)[source]

Evicts any node who is referenced in off_nodes list.

Extends:

app.domain.cluster_groups.Cluster.maintain().

Parameters

off_nodes (List[NodeType]) – The subset of members who disconnected during the current epoch.

Return type

None

membership_maintenance()[source]

Attempts to recruits new network nodes to be members of the cluster.

The method updates both members and _members_view.

Extends:

app.domain.cluster_groups.Cluster.membership_maintenance().

SGCluster.membership_maintenance adds and removes cloud references depending depending on the length of members before maintenance is performed.

Returns

A dictionary that is empty if membership did not change.

Return type

NodeDict

new_desired_distribution(member_ids, member_uptimes)[source]

Sets a new desired distribution for the SGCluster.

Received member_uptimes are normalized to create a stochastic representation of the desired distribution, which can be used by the different transition matrix generation strategies.

Parameters
Return type

List[float]

Note

member_ids and member_uptimes elements at each index should belong to each other, i.e., they should originate from from the same network node.

Returns

A list of floats with normalized uptimes which represent the “reliability” of network nodes.

Parameters
  • member_ids (List[str]) –

  • member_uptimes (List[float]) –

Return type

List[float]

new_transition_matrix()[source]

Creates a new transition matrix that is likely to be a Markov Matrix.

Returns

The labeled matrix that has the fastests mixing rate from all the pondered strategies.

Return type

DataFrame

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:

app.domain.cluster_groups.Cluster.nodes_execute().

Returns

A collection of members who disconnected during the current epoch. See app.domain.network_nodes.Node.update_status().

Return type

List[NodeType]

remove_cloud_reference()[source]

Remove cloud references and delete files within it

Note

This method is virtual.

Return type

None

select_fastest_topology(a, v_)[source]

Creates multiple transition matrices and selects the fastest.

The fastest of the created transition matrices corresponds to the one with a faster mixing rate.

Parameters
  • a (ndarray) – An adjacency matrix that represents the network topology.

  • v_ (ndarray) – A desired distribution vector that defines the returned matrix steady state property.

Returns

A transition matrix that is likely to be a markov matrix whose steady state is v_, but is not yet validated. See _validate_transition_matrix().

Return type

ndarray

spread_files(replicas, strat='i')[source]

Distributes a collection of FileBlockData objects among the members of the SGCluster.

Overrides:

app.domain.cluster_groups.Cluster.spread_files().

Parameters
Return type

None

class SGClusterExt(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.SGCluster

Represents a group of network nodes persisting a file.

SGClusterExt instances differ from SGCluster because their members are of type SGNodeExt. When combined these classes give nodes the responsibility of collaborating in the detection of faulty members of the SGClusterExt and eventually kicking them out of the group.

complaint_threshold

Reference value that defines the maximum number of complaints a network node can receive before it is evicted from the SGClusterExt.

Type

int

nodes_complaints

A dictionary mapping network node identifiers' to the number of complaints made against them by other members. When complaints becomes bigger than py:py:attr:complaint_threshold the complaintee is evicted from the group.

Type

Dict[str, int]

suspicious_nodes

A dictionary containing the unique node identifiers of known suspicious members and how many epochs have passed since they changed to such status.

Type

Dict[str, int]

_epoch_complaints

A set of unique identifiers formed from the concatenation of node identifiers, to avoid multiple complaint registrations on the same epoch, done by the same source towards the same target. The set is reset every epoch.

Type

set

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters
  • master (MasterType) – A reference to an Master object that manages the Cluster being initialized.

  • file_name (str) – The name of the file the Cluster is responsible for persisting.

  • members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.

  • sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

complain(complainter, complainee, reason)[source]

Registers a complaint against a possibly offline node.

A unique identifier for the complaint is generated by concatenation of the complainter and the complainee unique identifiers.

Overrides:

app.domain.cluster_groups.Cluster.complain()

Parameters
Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters

epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.

Returns

False if Cluster failed to persist the file it was responsible for, otherwise True.

Return type

None

maintain(off_nodes)[source]

Evicts any network node who has been complained about more than complaint_threshold times.

Overrides:

app.domain.cluster_groups.Cluster.maintain().

Parameters

off_nodes (List[NodeType]) – The subset of members who disconnected during the current epoch.

Return type

None

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:

app.domain.cluster_groups.SGCluster.nodes_execute().

Offline network nodes are considered suspects until enough complaints from other SGNodeExt members are received. This is important because lost parts can not be logged multiple times. Yet suspected network nodes need to be contabilized as offline for simulation purposes without being evicted from the group until they are detected by their peers as being offline.

Returns

A collection of members who disconnected during the current epoch. See app.domain.network_nodes.SGNodeExt.update_status().

Return type

List[NodeType]

class SGClusterPerfect(master, file_name, members, sim_id=0, origin='')[source]

Bases: app.domain.cluster_groups.SGCluster

Represents a group of network nodes persisting a file using swarm guidance algorithm.

This implementation assumes nodes never disconnect, there are no disk errors and there is no link loss, i.e., it is used to study properties of the system independently of computing environment.

__init__(master, file_name, members, sim_id=0, origin='')[source]

Instantiates an Cluster object

Parameters
  • master (MasterType) – A reference to an Master object that manages the Cluster being initialized.

  • file_name (str) – The name of the file the Cluster is responsible for persisting.

  • members (NodeDict) – A dictionary where keys are node identifiers and values are their instance objects.

  • sim_id (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • origin (str) – The name of the simulation file name that started the simulation process.

Return type

None

execute_epoch(epoch)[source]

Orders all members to execute their epoch.

Note

If the Cluster terminates early, before it reaches MAX_EPOCHS, nothing should be logged in LoggingData at the specified epoch to avoid skewing previously collected results.

Parameters

epoch (int) – The epoch the Cluster should currently be in, according to it’s managing master entity.

Returns

False if Cluster failed to persist the file it was responsible for, otherwise True.

Return type

None

new_transition_matrix()[source]

Creates a new transition matrix that is likely to be a Markov Matrix.

Returns

The labeled matrix that has the fastests mixing rate from all the pondered strategies.

Return type

DataFrame

nodes_execute()[source]

Queries all network node members execute the epoch.

Overrides:

app.domain.cluster_groups.Cluster.nodes_execute().

Returns

A collection of members who disconnected during the current epoch. See app.domain.network_nodes.Node.update_status().

Return type

List[NodeType]

select_fastest_topology(a, v_)[source]

Creates multiple transition matrices and selects the fastest.

The fastest of the created transition matrices corresponds to the one with a faster mixing rate.

Parameters
  • a (ndarray) – An adjacency matrix that represents the network topology.

  • v_ (ndarray) – A desired distribution vector that defines the returned matrix steady state property.

Returns

A transition matrix that is likely to be a markov matrix whose steady state is v_, but is not yet validated. See _validate_transition_matrix().

Return type

ndarray

app.domain.master_servers

This module contains domain specific classes that coordinate all app.domain.cluster_groups of a simulation instance. These could simulate centralized authentication servers, file localization or file metadata servers or a bank of currently online and offline storage nodes.

class HDFSMaster(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: app.domain.master_servers.Master

_process_simfile(path, cluster_class, node_class)[source]

Opens and processes the simulation filed referenced in path.

Overrides:

app.domain.master_servers.Master._process_simfile().

The method is exactly the same except for one instruction. The _split_files() is invoked with fixed bsize = 1MB. The reason for this is two-fold:

  • The default and, thus recommended, block size for the hadoop distributed file system is 128MB. The system is not designed to perform well with small file blocks, but SG requires many file blocks to work, hence being more effective with small block sizes.

  • Hadoop limits the minimum block size to be 1MB, dfs.namenode.fs-limits.min-block-size. For this reason, we make HDFSMaster split files into 1MB chunks, as that is the closest we would get to our Hive’s default block size in the real world.

The other difference is that the spread strategy is ignored. We are not interested in knowing if the way the files are initially spread affects the time it takes for clusters to achieve a steady-state distribution since in HDFS file block replicas are stationary on data nodes until they die.

Parameters
  • path (str) – The path to the simulation file. Including extension and parent folders.

  • cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See app.domain.cluster_groups.

  • node_class (str) – The name of the class used to instantiate network node instances through reflection. See app.domain.network_nodes.

Return type

None

class Master(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: object

Simulation manager class, some kind of puppet-master. Could represent an authentication server or a monitor that decides along with other Master entities what network nodes are online using consensus algorithms.

origin

The name of the simulation file name that started the simulation process.

Type

str

sid

Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

Type

int

epoch

The simulation’s current epoch.

Type

int

cluster_groups

A collection of cluster groups managed by the Master. Keys are cluster identifiers and values are the cluster instances.

Type

app.type_hints.ClusterDict

network_nodes

A dictionary mapping node identifiers to their instance objects. This collection differs from app.domain.cluster_groups.Cluster.members attribute in the sense that the former network_nodes includes all nodes, both online and offline, available on the entire distributed backup storage system regardless of their participation in any cluster group.

Type

app.type_hints.NodeDict

__init__(simfile_name, sid, epochs, cluster_class, node_class)[source]

Instantiates an Master object.

Parameters
  • simfile_name (str) – A path to the simulation file to be run by the simulator.

  • sid (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • epochs (int) – The number of discrete time steps the simulation lasts.

  • cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See cluster groups module.

  • node_class (str) – The name of the class used to instantiate network node instances through reflection. See network nodes module.

Return type

None

_create_network_nodes(json, node_class)[source]

Helper method that instantiates all network nodes that are specified in the simulation file.

Parameters
  • json (Dict[str, Any]) – The simulation file in JSON dictionary object format.

  • node_class (str) – The type of network node to create.

Return type

None

_new_cluster_group(cluster_class, size, fname)[source]

Helper method that initializes a new Cluster group.

Parameters
  • cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See cluster groups module.

  • size (int) – The cluster's initial memberhip size.

  • fname (str) – The name of the fille being stored in the cluster.

Returns

The Cluster instance.

Return type

ClusterType

_new_network_node(node_class, nid, node_uptime)[source]

Helper method that initializes a new Node.

Parameters
  • node_class (str) – The name of the class used to instantiate network node instances through reflection. See network nodes module.

  • nid (str) – An id that will uniquely identifies the network node.

  • node_uptime (str) – A float value in string representation that defines the uptime of the network node.

Returns

The Node instance.

Return type

NodeType

_process_simfile(path, cluster_class, node_class)[source]

Opens and processes the simulation filed referenced in path.

This method opens the file reads the json data inside it. Combined with app.environment_settings it sets up the class instances to be used during the simulation (e.g., cluster groups and network nodes). This method also be splits the file to be persisted in the simulation into multiple blocks or chunks and for triggering the initial file spreading mechanism.

Parameters
  • path (str) – The path to the simulation file. Including extension and parent folders.

  • cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See app.domain.cluster_groups.

  • node_class (str) – The name of the class used to instantiate network node instances through reflection. See app.domain.network_nodes.

Return type

None

_split_files(fname, cluster, bsize)[source]

Helper method that splits the files into multiple blocks to be persisted in a cluster group.

Parameters
  • fname (str) – The name of the file located in SHARED_ROOT folder to be read and splitted.

  • cluster (ClusterType) – A reference to the cluster group whose members will be responsible for ensuring the file specified in fname becomes durable.

  • bsize (int) – The maximum amount of bytes each file block can have.

Returns

A dictionary in which the keys are integers and values are file blocks, whose attribute number is the key.

Return type

ReplicasDict

execute_simulation()[source]

Starts the simulation processes.

Return type

None

find_online_nodes(n=1, blacklist=None)[source]

Finds n network nodes who are currently registered at the Master and whose status is online.

Parameters
  • n (int) – How many network node references the requesting entity wants to find.

  • blacklist (NodeDict) – A collection of nodes identifiers and their object instances, which specify nodes the requesting entity has no interest in.

Returns

A collection of network nodes which is at most as big as n, which does not include any node named in blacklist.

Return type

NodeDict

MAX_EPOCHS: Optional[int] = None
MAX_EPOCHS_PLUS_ONE: Optional[int] = None
class NewscastMaster(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: app.domain.master_servers.Master

__init__(simfile_name, sid, epochs, cluster_class, node_class)[source]

Instantiates an Master object.

Parameters
  • simfile_name (str) – A path to the simulation file to be run by the simulator.

  • sid (int) – Identifier that generates unique output file names, thus guaranteeing that different simulation instances do not overwrite previous out files.

  • epochs (int) – The number of discrete time steps the simulation lasts.

  • cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See cluster groups module.

  • node_class (str) – The name of the class used to instantiate network node instances through reflection. See network nodes module.

Return type

None

_process_simfile(path, cluster_class, node_class)[source]

Opens and processes the simulation filed referenced in path.

Overrides:

app.domain.master_servers.Master._process_simfile().

Newscast is a gossip-based P2P network. We assume erasure-coding would be used in this scenario and thus, for simplicity, we divide the specified file’s size into multiple 1/N, where N is the number of network nodes in the system.

Note

This class, NewscastCluster and NewscastNode were created to test our simulators performance, concerning the amount of supported simultaneous network nodes in a simulation. We do not actually care if the created file blocks are lost as the network nodes job in the simulation is to carry out the protocol defined in PeerSim’s AverageFunction. PeerSim uses configuration Example 2 provided in release 1.0.5, as a means of testing the simulator performance, according to this Ms.C. dissertation by J. Neto. This configuration uses Newscast protocol with AverageFunction and periodic monitoring of the system state. We implement our version of Adaptaive Peer Sampling with Newscast by N. Tölgyesi and M. Jelasity, to avoid the effort of translating PeerSim’s code.

Parameters
  • path (str) – The path to the simulation file. Including extension and parent folders.

  • cluster_class (str) – The name of the class used to instantiate cluster group instances through reflection. See app.domain.cluster_groups.

  • node_class (str) – The name of the class used to instantiate network node instances through reflection. See app.domain.network_nodes.

Return type

None

class SGMaster(simfile_name, sid, epochs, cluster_class, node_class)[source]

Bases: app.domain.master_servers.Master

get_cloud_reference()[source]

Use to obtain a reference to 3rd party cloud storage provider

The cloud storage provider can be used to temporarely host files belonging to cluster clusters in bad conditions that may compromise the file durability of the files they are responsible for persisting.

Note

This method is virtual.

Returns

A pointer to the cloud server, e.g., an IP Address.

Return type

str

_PersistentingDict: Dict[str, Dict[str, Union[List[str], str]]]

app.domain.network_nodes

This module contains domain specific classes that represent network nodes responsible for the storage of file blocks. These could be reliable servers or P2P nodes.

class HDFSNode(uid, uptime)[source]

Bases: app.domain.network_nodes.Node

Represents a data node in the Hadoop Distribute File System.

execute_epoch(cluster, fid)[source]

Instructs the HDFSNode instance to execute the epoch.

The method iterates files held in disk and attempts to corrupt them silently. In HDFS file blocks’ sha256 are only verified when a user or client accesses the remote replica. Hence, no replication epoch is set up when a corruption occurs. The corruption is still logged in the output file.

Overrides:

app.domain.network_nodes.Node.execute_epoch().

Parameters
Return type

None

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Replicas are sent selectively in descending order to the most reliable Nodes in the Cluster down to the least reliable.

Overrides:

app.domain.network_nodes.Node.replicate_part().

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters
Return type

None

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Overrides:

app.domain.network_nodes.Node.update_status().

Returns

The the status of the Node.

Return type

Status

class NewscastNode(uid, uptime)[source]

Bases: app.domain.network_nodes.Node

Represents a Peer running Newscast protocol, using shuffling techniques to exchange acquaintances with other network peers and performing peer degree aggregation using AverageFunction.

view

A partial view of the P2P network. View is a collection of network nodes, the NewscastNode instance may contact other than himself. Keys are NewscastNode instances, and values are their age in the dictionary. A key-value pair is commonly referenced as a descriptor.

aggregation_value

Stores the aggregation value. The type of aggregation_value is defined by the body of the aggregate() method.

__init__(uid, uptime)[source]

Instantiates a Node object.

These are network nodes responsible for persisting file block replicas.

Parameters
  • uid (str) – An unique identifier for the Node instance.

  • uptime (float) – The availability of the Node instance.

Return type

None

_merge(a, b)[source]

Merges two network views. If a node descriptor exists in both views, the most recent descriptor is kept.

Parameters
  • a (_NetworkView) – A dictionary where keys are network nodes and values are their respective age in the view.

  • b (_NetworkView) – A dictionary where keys are network nodes and values are their respective age in the view.

Returns

The set union of both views with only the most up to date descriptors.

Return type

_NetworkView

_select_view(view_buffer)[source]

Reduces the size of the view to a predefined maximum size.

:param A dictionary where keys are network nodes: :param and values are their respective age in the view.:

Returns

The view_buffer with at most max_view_size descriptors.

Parameters

view_buffer (_NetworkView) –

Return type

_NetworkView

add_neighbor(node)[source]

Adds a new network node to the node instance’s view.

If the view is full, the eldest node is replaced with the new node. Otherwise, the new NewscastNode is added to the instance’s view with age zero, unless the entry is already in view or the node is the current NewscastNode instance.

Returns

True if node was successfuly added, False otherwise.

Parameters

node (app.domain.network_nodes.NewscastNode) –

Return type

bool

aggregate(node=None)[source]

The network node instance contacts another node from his view, then, both nodes assign the mean of their degrees to aggregation_value.

Parameters

node (Optional[app.domain.network_nodes.NewscastNode]) – When node is None a random NewscastNode is selected from view. When specified to be contacted is the one referenced in the parameter.

Return type

None

execute_epoch(cluster, fid)[source]

Instructs the NewscastNode instance to execute the epoch.

During the execution of the epoch, the NewscastNode instance randomly selects another NewscastNode who belongs to his view and aggregates their degree using the Average Function. Sometimes, during the epoch, the NewscastNode instance will also perform shuffling with the selected target.

Overrides:

app.domain.network_nodes.Node.execute_epoch().

Parameters
Return type

None

get_degree()[source]

Counts the number of descriptors in the node’s view.

Returns

The degree of the NewscastNode instance.

Return type

int

get_node()[source]

Gets a random node from the current network view.

Each candidate NewscastNode to be returned is first pinged, if no answer is obtained, another node is selected as a candidate by iterating a list representation of view and the previous candidate is removed from the view.

Note

Newscast should always return a random node, thus iteration should not be used, but this search is more efficient and readable.

Returns

The selected NewscastNode.

Return type

Optional[app.domain.network_nodes.NewscastNode]

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Similar to send_part() but with slightly different instructions. In particular new replicas can not be corrupted at the current node, at the current epoch.

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters
Raises

NotImplementedError – When children of Node do not implement the abstract method.

Return type

None

shuffle(node)[source]

Starts a shuffle process that merges and crops two nodes’ views at the current node and at the destination node.

The final view consists of most up to date descriptors from both views up to a maximum of max_view_size descriptors.

Parameters

node (app.domain.network_nodes.NewscastNode) – The node to be contacted for shuffling.

Return type

None

shuffle_request(senders_view)[source]

Merges and crops two nodes’ views at the current node.

The final view consists of most up to date descriptors from both views up to a maximum of max_view_size descriptors.

Parameters

senders_view (_NetworkView) – A dictionary where keys are network nodes and values are their respective age in the view.

Returns

A view and a fresh descriptor from the NewscastNode instance, before it is merged with the requestor’s view.

Return type

_NetworkView

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Overrides:

app.domain.network_nodes.Node.update_status().

Returns

The the status of the Node.

Return type

Status

class Node(uid, uptime)[source]

Bases: object

This class contains basic network node functionality that should always be useful.

id

A unique identifier for the Node instance.

Type

str

uptime

The amount of time the Node is expected to remain online without disconnecting. Current uptime implementation is based on availability percentages.

Note

Current implementation expects network nodes joining a cluster group to remain online for approximately:

time_to_live = uptime * MAX_EPOCHS.

However, a network node who belongs to multiple cluster groups may disconnect earlier than that, i.e., network nodes remain online time_to_live after their first operation on the distributed backup system.

Type

float

status

Indicates if the Node instance is online or offline. In later releases this could also contain a ‘suspect’ status.

Type

app.domain.helpers.enums.Status

suspicious_replies

Collection that contains http codes that when received, trigger complaints to monitors about the replier.

Type

set

files

A dictionary mapping file names to dictionaries of file block identifiers and their respective contents, i.e., the file block replicas hosted at the Node.

Type

Dict[str, ReplicasDict]

__init__(uid, uptime)[source]

Instantiates a Node object.

These are network nodes responsible for persisting file block replicas.

Parameters
  • uid (str) – An unique identifier for the Node instance.

  • uptime (float) – The availability of the Node instance.

Return type

None

discard_part(fid, number, corrupt=False, cluster=None)[source]

Safely deletes a part from the SGNode instance’s disk.

Parameters
  • fid (str) – Name of the file the file block replica belongs to.

  • number (int) – The part number that uniquely identifies the file block.

  • corrupt (bool) – If discard is being invoked due to identified file block corruption, e.g., Sha256 does not match the expected.

  • cluster (ClusterType) – Cluster that will set the replication epoch or mark the simulation as failed.

Return type

None

execute_epoch(cluster, fid)[source]

Instructs the Node instance to execute the epoch.

Parameters
Raises

NotImplementedError – When children of Node do not implement the abstract method.

Return type

None

get_file_parts(fid)[source]

Gets collection of file parts that correspond to the named file.

Parameters

fid (str) – The file name identifier that designates the file block replicas to be retrieved.

Returns

A dictionary where keys are file block numbers and values are file block replicas

Return type

ReplicasDict

get_file_parts_count(fid)[source]

Counts the number of file block replicas of a specific file owned by the Node.

Parameters

fid (str) – The file name identifier that designates the file block replicas to be counted.

Returns

The number of counted replicas.

Return type

int

is_suspect()[source]

Returns True if the node is behaving suspiciously, else False.

Return type

bool

is_up()[source]

Returns True if the node is online, else False.

Return type

bool

receive_part(replica)[source]

Endpoint for file block replica reception.

The Node stores a new file block replica in files if he does not have a replica with same identifier.

Parameters

replica (domain.helpers.smart_dataclasses.FileBlockData) – The file block replica to be received by Node.

Returns

If upon integrity verification the sha256 hashvalue differs from the expected, the worker replies with a BAD_REQUEST. If the Node already owns a replica with the same identifier it replies with NOT_ACCEPTABLE. Otherwise it replies with a OK, i.e., the delivery is successful.

Return type

HttpCodes

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Similar to send_part() but with slightly different instructions. In particular new replicas can not be corrupted at the current node, at the current epoch.

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters
Raises

NotImplementedError – When children of Node do not implement the abstract method.

Return type

None

send_part(cluster, destination, replica)[source]

Attempts to send a replica to some other network node.

Parameters
  • cluster (ClusterType) – A reference to the Cluster that will deliver the new replica. In a real world implementation this argument would not make sense, but we use it to facilitate simulation management and environment logging.

  • destination (str) – The name, address or another unique identifier of the node that will receive the file block replica.

  • replica (FileBlockData) – The file block container to be sent to some other worker.

Returns

An http code.

Return type

HttpCodes

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Returns

The the status of the Node.

Return type

Status

class SGNode(uid, uptime)[source]

Bases: app.domain.network_nodes.Node

Represents a network node that executes a Swarm Guidance algorithm.

clusters

A collection of cluster groups the SGNode is a member of.

routing_table

Contains the information required to appropriately route file block blocks to other SGNode instances.

Type

Dict[str, DataFrame]

__init__(uid, uptime)[source]

Instantiates a Node object.

These are network nodes responsible for persisting file block replicas.

Parameters
  • uid (str) – An unique identifier for the Node instance.

  • uptime (float) – The availability of the Node instance.

Return type

None

execute_epoch(cluster, fid)[source]

Instructs the Node instance to execute the epoch.

The method iterates all file block blocks in files and independently decides if they should be sent to another SGNode by following the probabilities in routing_table column vectors.

Overrides:

app.domain.network_nodes.Node.execute_epoch().

Parameters
Return type

None

remove_file_routing(fid)[source]

Removes a file name from the SGNode routing table.

This method is called when a SGNode is evicted from the cluster group and results in the deletion from disk of all file block replicas with identifier fid.

Parameters

fid (str) – The file name identifier of the file whose routing is being eliminated.

Return type

None

replicate_part(cluster, replica)[source]

Attempts to restore the replication level of the specified file block replica.

Similar to send_part() but with slightly different instructions. In particular new replicas can not be corrupted at the current node, at the current epoch. The replicas are also sent selectively in descending order to the most reliable Nodes in the Cluster down to the least reliable. Whereas send_part(). follows stochastic swarm guidance routing.

Overrides:

app.domain.network_nodes.Node.replicate_part().

Note

There are no guarantees that REPLICATION_LEVEL will be completely restored during the execution of this method.

Parameters
Return type

None

select_destination(fid)[source]

Selects a random message destination according to routing_table probabilities for the specified file name.

Parameters

fid (str) – The file name identifier to obtain the proper routing_table for destination selection.

Returns

The name or address of the selected destination.

Return type

str

set_file_routing(fid, v_)[source]

Maps a file name identifier with a transition column vector used for file block replica routing.

Parameters
  • fid (str) – The file name identifier of the file whose routing is being configured.

  • v_ (Union[Series, DataFrame]) – A column vector with probabilities that dictate the odds of sending file block blocks belonging to the file with specified id to other Cluster members also working on the persistence of the file block blocks.

Raises

ValueError – If transition_vector is not a DataFrame and cannot be casted to it.

Return type

None

class SGNodeExt(uid, uptime)[source]

Bases: app.domain.network_nodes.SGNode

Represents a network node that executes a Swarm Guidance algorithm.

SGNodeExt instances differ from SGNode in the sense that the latter does not monitor the peers belonging to his cluster groups, concerning their connectivity status or suspicious behaviours.

update_status()[source]

Used to update the time to live of the node instance.

When invoked, the network node decides if it should remain online or change some other state.

Overrides:

app.domain.network_nodes.Node.update_status().

Returns

The the status of the Node.

Return type

Status

_NetworkView: Dict[Union[str, app.domain.network_nodes.Node], int]