Major Objects: Difference between revisions

From Bitpost wiki
No edit summary
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== Overview ===
Use '''Major Objects''' for fast in-memory handling of large amount of data that is thread-safe but must be persisted.  Support complex objects with simple keys, fast lookup by multiple keys, serialization from/to multiple persistent layers, and fast DRY schema changes.
* '''''Major Objects'''''
 
** Use Major Objects for fast in-memory handling of large amount of data that is thread-safe but must be persisted
=== Major Objects ===
** We must support complex objects with simple keys, crud, and fast lookup by multiple keys
* Use an unordered_set of const pointers to objects derived from PersistentIDObject (see below)
** Our most useful containers are vector, set (key in object) and map (<key,value> pair).  Set can give us almost every positive feature, when used to store the PersistentIDObject class.
* The main container's primary key is always db_id
** Use an unordered_set of const pointers to objects derived from PersistentIDObject
* Always use the db_id for foreign keys
** The default container should index by db_id primary key
* Other containers can be created using other members as keys; the only cost is for a new set of pointers (not objects!)
** Always use the db_id for foreign keys
** For other unordered_sets, just define new hash functions
** Other containers can be created with alternate keys using object members; just define new hash functions.
** Other useful containers are sorted_vector and map (<key,value> pair)
* '''''PersistentIDObject'''''
=== PersistentIDObject ===
** Add a dirty flag to all objects, set to true on any change that must be persisted
* Add a dirty flag to all objects, set to true on any change that must be persisted
** Use an internal in-memory counter to generate the next db_id for a newly created object
* Use an internal in-memory max_db_id counter in the parent to generate the next db_id for a newly created object
** This means that when creating new objects, there is NO NEED to access db, VERY IMPORTANT!
* This means that when creating new objects, there is NO NEED to access db, VERY IMPORTANT!
** Use delayed-write tactics to write all dirty objects on idle time
* Use delayed-write tactics to write all dirty objects on idle time
* '''''Memory Model'''''
 
** Use a Datastore manager (aka "MemoryModel") to hold sets
=== Memory Model ===
** It can look up objects by any key, and strip away const to return a mutable object.  NOTE that the user must not damage the key values!
* Use a Datastore manager (aka "MemoryModel") to hold sets
** Derive a class from the memory model for persistence; it can use any persistence method (local, remote, sql, nosql, etc.).
* It can look up objects by any key, and strip away const to return a mutable object.  NOTE that the user must not damage the key values!
** Make sure that the base MemoryModel class is concrete not abstract, thread-safe and self-contained; this makes parallel calculations trivial, helps scalability, etc.
* Derive a class from the memory model for persistence; it can use any persistence method (local, remote, sql, nosql, etc.).
* Make sure that the base MemoryModel class is concrete not abstract, thread-safe and self-contained; this makes parallel calculations trivial, helps scalability, etc.
=== Schema Driven ===
* Schema is shared across memory objects and every persistence layer.  Follow a DRY pattern by generating all code from a common schema definition.
* JSON schema is elegant as it is as simple as possible but no less, always use it.  Essential tools:
** [https://quicktype.io/ quicktype] - use npm to install it and to incorporate code generation into CI; this gives us reflection in C++!
** [https://jsonformatter.curiousconcept.com/ JSON formatter]
* Always provide a constructor following this format, that by default creates temporary objects that can be safely thrown away:
    // This constructor serves several purposes:
    //  1) standard full-param constructor, efficient for both deserializing and initializing
    //  2) no-param constructor for reflection via quicktype
    //  3) id constructor for loading via id + quicktype fields
    //  4) id constructor for use as key for unsorted_set::find()
    StockQuoteDetails(
        int64_t db_id = PersistentIDObject::DBID_DO_NOT_SAVE,
        double quote = -1.0,
        time_t timestamp = 0,              // Set invalid by default until we get a live one
        AutotradeParameterSet* p_aps = 0    // For global autoanalysis results
    ) :
        // Call base class
        inherited(sq_max_db_id_,db_id),
        // internal members
        n_refcount_(1)
    {
        // persistent members
        quote_ = quote;
        timestamp_ = timestamp;
    }
* Use object pointers in memory, and provide code-friendly accessors, eg:
    // EXTERNAL REFERENCES
    // NOTE we are not responsible for these allocations.
    // Access pointers to parents as references.
    void setParent(BrokerAccount& ba) { pba_ = &ba; }
    BrokerAccount& ba() { assert(pba_ != 0); return *pba_; }
    BrokerAccount& ba() const { assert(pba_ != 0); return *pba_; }
    void setStockQuote(StockQuote& sq) { psq_ = &sq; }
    StockQuote& sq() { assert(psq_ != 0); return *psq_; }
    StockQuote& sq() const { assert(psq_ != 0); return *psq_; }
    // Accces nullable pointers directly.
    AnalysisData* pad_;    // the analysis parameters used to run the last analysis
* Use simple JSON types in memory, and provide code-friendly accessors, eg:
        // Cast database JSON types to code-friendly types
        RANK_TYPE rt() const { return (RANK_TYPE)rank_type_; }
        APS_SCOPE scope() const { return (APS_SCOPE)aps_scope_; }
        ORDER_STATUS status() const { return (ORDER_STATUS)order_status_; }
        time_t tFirstSellDate() const { return (time_t)first_sell_date_; }
        double  getPctGain(GAIN_TIMEINTERVAL_TYPE gtt) const;
        void    setPctGain(GAIN_TIMEINTERVAL_TYPE gtt, const double& gain);
        int64_t getSellsCount(GAIN_TIMEINTERVAL_TYPE gtt) const;
        void    setSellsCount(GAIN_TIMEINTERVAL_TYPE gtt, int64_t count);
 
* Read pattern: read entire set, then loop the set and patch/distribute each object as needed; eg:
bool SqliteLocalModel::readUsers() {
    int_fast32_t count = 0;
    try {
        if (!read("AppUsers",AppUser(),users_))
            return false;
        for (auto& u : users_)
        {
            addAppUserToMemory(u);
            u->setSaved();
            ++count;
        }
    }
* Write pattern: hopefully simple, eg:
bool SqliteLocalModel::write(AppUser& obj) {
    json j; to_json(j,obj); return write("AppUsers",obj,j,true);
}
 
=== THE RESULT ===
* Fast efficient unsorted_set<PersistentIDObject*> containers that can be automatically serialized to/from any database layer
* Fast supplemental in-memory indexes into objects wherever needed
* DRY code that allows rapid schema changes within a complete yet simplified codebase


=== Delayed delete pattern ===
=== Delayed delete pattern ===
             1) all deletion work should be done in one place:
             1) to dynamically delete an object:  
                 bool SqliteLocalModel::deleteBrokerAccount(BrokerAccount& ba)
                 a) ba.setDeleted();
                  a) delete from db
                b) do not remove from any container indexes
                  b) call inherited::() and use container to delete from all container indices
                c) but fix the index sorting, flags, etc, as if the object were gone, so the program will function as if it is!
                  c) delete memory allocation
                  eg: do not remove from runsByRank_, but adjust all other ranking as if the run was gone
             2) deletion check should happen in delayed write check:
             2) include deleted status in active check, etc.:
                 if (pau->bDirtyOrDeleted())
                 // NOTE use the direct function rather than !bFunc(), as deleted objects return false for both.
                    bNeeded = true;
                bool bActive() const        { return b_active_ && !bDeleted();  }
                 ----
                bool bInactive() const      { return !b_active_ && !bDeleted(); }
                 if (pa->bDeleted())
                 ---
                    deleteBrokerAccount(*pa);
                 for (auto& psr: runsByRank_) {
                else...
                  if (psr->bDeleted()) continue;
             3) to dynamically delete an object:  
                  ...
                 ba.setDeleted();
             3) all deletion work is done in MemoryModel::saveDirtyObjectsAsNeeded(), see that code
            4) include deleted status in active check:
                 a) deletion check should happen in delayed write check:
                bool bActive() const { return b_active_ && !bDeleted(); }
                    if (pau->bDirtyOrDeleted())
                        bNeeded = true;
                b) if bNeeded, always do deletions first, starting with greatest grandparent container, to minimize work
                c) use the erase-remove pattern to remove all deleted items in one loop
                    see code here for reference implementation: BrokerAccount::removeDeletedStockRuns()
                      i) iterate and remove item from all secondary indices
                      ii) iterate primary index, and use the lambda of the erase-remove operation to delete memory allocation and remove db record
                      iii) associative container iterators can be safely deleted directly
                          sequential containers like vector require use of erase-remove idiom
                          see reference implementation for example code!

Latest revision as of 02:44, 3 August 2020

Use Major Objects for fast in-memory handling of large amount of data that is thread-safe but must be persisted. Support complex objects with simple keys, fast lookup by multiple keys, serialization from/to multiple persistent layers, and fast DRY schema changes.

Major Objects

  • Use an unordered_set of const pointers to objects derived from PersistentIDObject (see below)
  • The main container's primary key is always db_id
  • Always use the db_id for foreign keys
  • Other containers can be created using other members as keys; the only cost is for a new set of pointers (not objects!)
    • For other unordered_sets, just define new hash functions
    • Other useful containers are sorted_vector and map (<key,value> pair)

PersistentIDObject

  • Add a dirty flag to all objects, set to true on any change that must be persisted
  • Use an internal in-memory max_db_id counter in the parent to generate the next db_id for a newly created object
  • This means that when creating new objects, there is NO NEED to access db, VERY IMPORTANT!
  • Use delayed-write tactics to write all dirty objects on idle time

Memory Model

  • Use a Datastore manager (aka "MemoryModel") to hold sets
  • It can look up objects by any key, and strip away const to return a mutable object. NOTE that the user must not damage the key values!
  • Derive a class from the memory model for persistence; it can use any persistence method (local, remote, sql, nosql, etc.).
  • Make sure that the base MemoryModel class is concrete not abstract, thread-safe and self-contained; this makes parallel calculations trivial, helps scalability, etc.

Schema Driven

  • Schema is shared across memory objects and every persistence layer. Follow a DRY pattern by generating all code from a common schema definition.
  • JSON schema is elegant as it is as simple as possible but no less, always use it. Essential tools:
    • quicktype - use npm to install it and to incorporate code generation into CI; this gives us reflection in C++!
    • JSON formatter
  • Always provide a constructor following this format, that by default creates temporary objects that can be safely thrown away:
   // This constructor serves several purposes:
   //  1) standard full-param constructor, efficient for both deserializing and initializing
   //  2) no-param constructor for reflection via quicktype
   //  3) id constructor for loading via id + quicktype fields
   //  4) id constructor for use as key for unsorted_set::find()
   StockQuoteDetails(
       int64_t db_id = PersistentIDObject::DBID_DO_NOT_SAVE,
       double quote = -1.0,
       time_t timestamp = 0,               // Set invalid by default until we get a live one
       AutotradeParameterSet* p_aps = 0    // For global autoanalysis results
   ) :
       // Call base class
       inherited(sq_max_db_id_,db_id),

       // internal members
       n_refcount_(1)
   {
       // persistent members
       quote_ = quote;
       timestamp_ = timestamp;
   }
  • Use object pointers in memory, and provide code-friendly accessors, eg:
   // EXTERNAL REFERENCES
   // NOTE we are not responsible for these allocations.
   // Access pointers to parents as references.
   void setParent(BrokerAccount& ba) { pba_ = &ba; }
   BrokerAccount& ba() { assert(pba_ != 0); return *pba_; }
   BrokerAccount& ba() const { assert(pba_ != 0); return *pba_; }

   void setStockQuote(StockQuote& sq) { psq_ = &sq; }
   StockQuote& sq() { assert(psq_ != 0); return *psq_; }
   StockQuote& sq() const { assert(psq_ != 0); return *psq_; }

   // Accces nullable pointers directly.
   AnalysisData* pad_;     // the analysis parameters used to run the last analysis
  • Use simple JSON types in memory, and provide code-friendly accessors, eg:
       // Cast database JSON types to code-friendly types
       RANK_TYPE rt() const { return (RANK_TYPE)rank_type_; }
       APS_SCOPE scope() const { return (APS_SCOPE)aps_scope_; }
       ORDER_STATUS status() const { return (ORDER_STATUS)order_status_; }
       time_t tFirstSellDate() const { return (time_t)first_sell_date_; }
       double  getPctGain(GAIN_TIMEINTERVAL_TYPE gtt) const;
       void    setPctGain(GAIN_TIMEINTERVAL_TYPE gtt, const double& gain);
       int64_t getSellsCount(GAIN_TIMEINTERVAL_TYPE gtt) const;
       void    setSellsCount(GAIN_TIMEINTERVAL_TYPE gtt, int64_t count);
  • Read pattern: read entire set, then loop the set and patch/distribute each object as needed; eg:
bool SqliteLocalModel::readUsers() {
   int_fast32_t count = 0;
   try {
       if (!read("AppUsers",AppUser(),users_))
           return false;
       for (auto& u : users_)
       {
           addAppUserToMemory(u);
           u->setSaved();
           ++count;
       }
   }
  • Write pattern: hopefully simple, eg:
bool SqliteLocalModel::write(AppUser& obj) { 
    json j; to_json(j,obj); return write("AppUsers",obj,j,true);
}

THE RESULT

  • Fast efficient unsorted_set<PersistentIDObject*> containers that can be automatically serialized to/from any database layer
  • Fast supplemental in-memory indexes into objects wherever needed
  • DRY code that allows rapid schema changes within a complete yet simplified codebase

Delayed delete pattern

           1) to dynamically delete an object: 
               a) ba.setDeleted();
               b) do not remove from any container indexes
               c) but fix the index sorting, flags, etc, as if the object were gone, so the program will function as if it is!
                  eg: do not remove from runsByRank_, but adjust all other ranking as if the run was gone
           2) include deleted status in active check, etc.:
               // NOTE use the direct function rather than !bFunc(), as deleted objects return false for both.
               bool bActive() const        { return b_active_ && !bDeleted();  }
               bool bInactive() const      { return !b_active_ && !bDeleted(); }
               ---
               for (auto& psr: runsByRank_) {
                 if (psr->bDeleted()) continue;
                 ...
           3) all deletion work is done in MemoryModel::saveDirtyObjectsAsNeeded(), see that code
               a) deletion check should happen in delayed write check:
                   if (pau->bDirtyOrDeleted())
                       bNeeded = true;
               b) if bNeeded, always do deletions first, starting with greatest grandparent container, to minimize work
               c) use the erase-remove pattern to remove all deleted items in one loop
                   see code here for reference implementation: BrokerAccount::removeDeletedStockRuns()
                     i) iterate and remove item from all secondary indices
                     ii) iterate primary index, and use the lambda of the erase-remove operation to delete memory allocation and remove db record
                     iii) associative container iterators can be safely deleted directly
                          sequential containers like vector require use of erase-remove idiom
                          see reference implementation for example code!