Mesh and Field Data Generation

MACSio is capable of generating a wide variety of mesh and variable data and amorphous metadata typical of HPC multi-physics applications. Currently, MACSio can generate structured, rectilinear, unstructured and arbitrary polyhedral, multi-block meshes in 2 or 3 dimensions. In addition, regardless of the particular type of mesh MACSio generates for purposes of I/O performance testing, it stores and marshalls all of the resultant data in an uber JSON-C object that is passed around witin MACSio and between MACSIO and its I/O plugins.

MACSio employs a very simple algorithm to generate and then decompose a mesh in parallel. However, the decomposition is also general enough to create multiple mesh pieces on individual MPI ranks and for the number of mesh pieces to vary, somewhat, between MPI ranks. At present, there is no support to explicitly specify a particular arrangement of mesh pieces and MPI ranks. However, such enhancement is planned for the near future.

Once the global whole mesh shape is determined as a count of total pieces and as counts of pieces in each of the logical dimensions, MACSio uses a very simple algorithm to assign mesh pieces to MPI ranks. The global list of mesh pieces is numbered starting from 0. First, the number of pieces to assign to rank 0 is chosen. When the average piece count is non-integral, it is a value between K and K+1. So, MACSio randomly chooses either K or K+1 pieces but being carful to weight the randomness so that once all pieces are assigned to all ranks, the average piece count per rank target is achieved. MACSio then assigns the next K or K+1 numbered pieces to the next MPI rank. It continues assigning pieces to MPI ranks, in piece number order, until all MPI ranks have been assigned pieces. The algorithm runs indentically on all ranks. When the algorithm reaches the part assignment for the rank on which its executing, it then generates the K or K+1 mesh pieces for that rank. Although the algorithm is essentially a sequential algorithm with asymptotic behavior O(#total pieces), it is primarily a simple book-keeping loop which completes in a fraction of a second even for more than one million pieces.

Each piece of the mesh is a simple rectangular region of space. The spatial bounds of that region are easily determined. Any variables to be placed on the mesh can be easily handled as long as the variable’s spatial variation can be described in the global goemetric space.

Data API

group MACSIO_DATA

Data (Mesh) Generation.

Functions

struct json_object *MACSIO_DATA_GenerateTimeZeroDumpObject(struct json_object *main_obj, int *rank_owning_chunkId)

Parameters
  • main_obj: The main JSON object holding mesh, field, amorphous data

  • rank_owning_chunkId: missing this info

int MACSIO_DATA_GetRankOwningPart(struct json_object *main_obj, int chunkId)

Given a chunkId, return rank of owning task.

int MACSIO_DATA_ValidateDataRead(struct json_object *main_obj)

Not yet implemented.

int MACSIO_DATA_SimpleAssignKPartsToNProcs(int k, int n, int my_rank, int *my_part_cnt, int **my_part_ids)

Not yet implemented.

struct json_object *MACSIO_DATA_EvolveDataset(struct json_object *main_obj, int *dataset_evolved, float factor, int growth_bytes)

Vary mesh and field data including changing not only values but sizes.

Random (Amourphous) Object Generation

In addition to the data necessary to represent the main mesh and field MACSio produces, it is also common for HPC multi-physics code to generate modest amount of amourphous metadata dealing with such things as materials, material models and material properties tables, user input controls, performance tracking information, library versions and data provenenance, debugging information, etc. This kind of information often takes the form of a large hierarchy of key-value pairs where the values can sometimes be arrays on the order of the size of the number of tasks, etc.

Two methods in MACSio’s data generation package help to create some random but nonetheless statistically similar kinds of data.

group MACSIO_RANDOBJ

Construct a random JSON object.

Functions

struct json_object *MACSIO_DATA_MakeRandomObject(int maxd, int nraw, int nmeta, unsigned maxds)

Parameters
  • maxd: maximum depth of object tree

  • nraw: bytes of raw data (in extarr objects) in final object

  • nmeta: bytes of meta data (in non-extarr objects) in final object

  • maxds: maximum size of any given raw (extarr) object

struct json_object *MACSIO_DATA_MakeRandomTable(int nrecs, int totbytes)

Construct a random table of some number of random JSON objects.

Parameters
  • nrecs: number of records in the table

  • totbytes: total bytes in the table

Randomization

Randomization of various of MACSio’s behaviors may be explicitly enforced or prevented. However, to ensure such behaviors, random number generation needs to be handled carefully especially when dealing with reproducibility from run to run and/or agreement across parallel tasks or both.

For this reason, MACSio creates and initializes several default pseudo random number generators (PRNGs), all of which utilize the C standard library random() method.

Pseudo Random Number Generator (PRNG) support is handled by maintaining a series of state vectors for calls to initstate/setstate of C library random() and friends. Multiple different PRNGs are supported each having zero or more special properties depending on how they are initialized (e.g. the seed used from run to run and/or from task to task) as well as how they are expected to be used/called. For example, if a PRNG is intended to produce identical values on each task, then it is up to the caller to ensure the respective PRNG is called consistently (e.g. collectively) across tasks. Otherwise, unintended or indeterminent behavior may result.

The default PRNGs MACSio creates are…

NaiveMD_random_naive

The naive prng is fine for any randomization that does not need special behaviors. This is the one that is expected to be used most of the time. Think of it as a wholesale replacement for using C library random() directly. MD_random is a shorthand alias for MD_random_naive

Naive, Time VaryingMD_random_naive_tv

Like MD_random_naive except that it is seeded based on current time and will therefore ensure variation from run to run (e.g. is time varying). It should be used in place of MD_random_naive where randomization from run to run is needed.

Naive, Rank VaryingMD_random_naive_rv

Like MD_random_naive but ensures variation across tasks (e.g. is rank varying) by seeding based on task rank. It should be used in place of MD_random_naive where randomization from task to task is needed.

Naive, Rank and Time VaryingMD_random_naive_rtv

Like MD_random_naive_rv but also ensures results vary from rank to rank and run to run.

Rank InvariantMD_random_rankinv

Is neither seeded based on rank or time nor shall be used by a caller in way that would cause inconsistency across ranks. It is up to the caller to ensure it is called consistently (e.g. collectively) across ranks. Otherwise indeterminent behavior may result.

Rank Invariant, Time VaryingMD_random_rankinv_tv

Like MD_random_rankinv except ensures variation in randomization from run to run. So, for each given run, all ranks will agree on randomization but from run to run, that agreement will vary.

If these PRNGs are not sufficient, developers are free to create other PRNGs for other specific purposes using MACSIO_DATA_CreatePRNG().

PRNG API

group MACSIO_PRNGS

Randomization and Pseudo Random Number Generators.

Defines

MD_random_naive()

Get next random value from naive PRNG. Same as MD_random().

The naive prng is fine for any randomization that does not need special behaviors. This is the one that is expected to be used most of the time.

MD_random()

Convenience shorthand for MD_random_naive()

MD_random_naive_tv()

Get next random value from naive, time-variant PRNG.

The naive_tv PRNG is like naive except that it is seeded based on current time and so will change from run to run (e.g. is time-variant). It should be used in place of MD_random_naive() when randomization from run to run is required.

MD_random_naive_rv()

Get next random value from naive, rank-variant PRNG.

Like MD_random_naive() but guarantees variation across ranks.

MD_random_naive_rtv()

Get next random value from naive, rank- and time-variant PRNG.

Like MD_random_naive_rv() but guarantees variation from run to run.

MD_random_rankinv()

Get next random value from rank-invariant PRNG.

The rank_invariant PRNG is neither seeded based on rank nor shall be used in way that would cause inconsistency across ranks. Failing to call it collectively will likely cause indeterminent behavior. There is no enforcement of this requirement. However, a sanity check is performed at application termination…there is a collective MPI_BXOR reduce on the current value from this PRNG and if the result is non-zero, an error message is generated in the logs.

MD_random_rankinv_tv()

Get next random value from rank-invariant but time-variant PRNG.

Like the MD_random_rankinv() except causes variation in randomization from run to run

Functions

int MACSIO_DATA_CreatePRNG(unsigned seed)

Create a Pseudo Random Number Generator (PRNG)

Parameters
  • seed: seed for initializing random number generator

Return

An integer identifier for the created PRNG

void MACSIO_DATA_ResetPRNG(int prng_id)

Reset a PRNG This will cause the specified PRNG to restart the sequence of pseudo random numbers it generates.

Parameters
  • prng_id: identifier for the PRNG to reset

long MACSIO_DATA_GetValPRNG(int prng_id)

Get next value from PRNG.

Parameters
  • prng_id: identifer of the desired PRNG to get a value from

Return

Next value from the PRNG sequence. Logically equivalent to calling random().

void MACSIO_DATA_DestroyPRNG(int prng_id)

Free resources associated a PRNG.

Parameters
  • prng_id: identifer of the desired PRNG to free

void MACSIO_DATA_InitializeDefaultPRNGs(unsigned mpi_rank, unsigned curr_time)

Initialize default Pseudo Random Number Generators (PRNGs) Should be called near beginning of application startup. Creates the (default) PRNGs described here. Applications are free to create others as needed.

Parameters
  • mpi_rank: unsigned representation of MPI rank of calling processor

  • curr_time: unsigned but otherwise arbitrary representation of current time but which is equal on all ranks yet guaranteed to vary from run to run.

void MACSIO_DATA_FinalizeDefaultPRNGs(void)

Free up resources for default PRNGs Should be called near the termination of application.

PRNG Example (tstprng.c)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107

#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#ifdef HAVE_MPI
#include <mpi.h>
#endif

#include <macsio_data.h>

int main(int argc, char **argv)
{
    int i, id1, id2, id3, id5;
    int parallel = 0, ckfail = 0;
    long tseed;
    long series1[100], series2[100], series3[100], series4[100][5], series5[100][5];
    int rank=0, size=1;
    struct timeval tv;

    for (i = 1; i < argc; i++)
    {
        if (!strcasestr(argv[i], "--parallel"))
            parallel = 1;
        else if (!strcasestr(argv[i], "--ckfail"))
            ckfail = 1;
    }

#ifdef HAVE_MPI
    if (parallel)
    {
        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    }
#endif

    gettimeofday(&tv, 0);
    tseed = (long) (((unsigned int) (tv.tv_sec + tv.tv_usec)) & 0xFFFFFFFF);

    MACSIO_DATA_InitializeDefaultPRNGs(rank, tseed);

    id1 = MACSIO_DATA_CreatePRNG(0xDeadBeef);
    id2 = MACSIO_DATA_CreatePRNG(tseed); /* time varying output */
    id3 = MACSIO_DATA_CreatePRNG(0xBabeFace);
    id5 = MACSIO_DATA_CreatePRNG(0xDeadBeef);

    for (i = 0; i < 100; i++)
    {
        int j, id4 = MACSIO_DATA_CreatePRNG(0x0BedFace);
        series1[i] = MACSIO_DATA_GetValPRNG(id1);
        series2[i] = i%3?0:MACSIO_DATA_GetValPRNG(id2);
        series3[i] = i%5?0:MACSIO_DATA_GetValPRNG(id3);
        for (j = 0; j < 5; j++)
            series4[i][j] = MACSIO_DATA_GetValPRNG(id4);
        MACSIO_DATA_DestroyPRNG(id4);
        for (j = 0; j < 5; j++)
            series5[i][j] = MACSIO_DATA_GetValPRNG(id5);
        MACSIO_DATA_ResetPRNG(id5);
        if (i && !(i%50))
            MACSIO_DATA_ResetPRNG(id2);
        MD_random();
        MD_random_rankinv();
        MD_random_rankinv_tv();
        if (ckfail && rank == 0 && i < 5)
        {
            MD_random_rankinv();
            MD_random_rankinv_tv();
        }
    }
    if (memcmp(&series2[0], &series2[51], 40*sizeof(long)))
        return 1;
    if (memcmp(&series4[0], &series4[1], 5*sizeof(long)))
        return 1;
    if (memcmp(&series4[1], &series4[23], 5*sizeof(long)))
        return 1;
    if (memcmp(&series5[0], &series5[1], 5*sizeof(long)))
        return 1;
    if (memcmp(&series5[1], &series5[23], 5*sizeof(long)))
        return 1;

    MACSIO_DATA_DestroyPRNG(id1);
    MACSIO_DATA_DestroyPRNG(id2);
    MACSIO_DATA_DestroyPRNG(id3);
    MACSIO_DATA_DestroyPRNG(id5);

#ifdef HAVE_MPI
    if (parallel)
    {
        /* Lets confirm the rank-invariant PRNGs agree and issue a message if not */
        unsigned rand_check[2] = {MD_random_rankinv(), MD_random_rankinv_tv()};
        unsigned rand_check_r[2];
        MPI_Reduce(rand_check, rand_check_r, 2, MPI_UNSIGNED, MPI_BXOR, 0, MPI_COMM_WORLD);
        assert(rand_check_r[0] == 0 && rand_check_r[1] == 0);
    }
#endif

    MACSIO_DATA_FinalizeDefaultPRNGs();

#ifdef HAVE_MPI
    if (parallel)
        MPI_Finalize();
#endif

    return 0;
}