I spent this morning exploring available tech to address this goal:

Add a bigdata database to my application, to archive older data out of the realtime local model

This is for my stock app, which deals with realtime in-memory data during market hours, with a delayed-write to local storage.  At the end of the day, it can then archive most of the data collected during market hours.

Because I have not achieved “success” in life yet, at least enough to allow me to pursue my larger goals uninhibited, I have to be very careful about how I apply my limited resources.  To be more precise:

  • Follow patterns that are as simple as possible (but no less), and sustainable into the next decade.
  • For my projects, limit languages, libraries and tools to those that are
    • well maintained
    • solve difficult problems more elegantly than I could solve with a medium level of effort

The result of today’s philosophically-informed research:

  • The primary languages of my software projects should be Javascript and C++
  • All data should be defined by JSON schema that is used to generate code, via quicktype
  • Long-term libraries and tools include boost, jquery, bootstrap, accounting.js, moment.js, nlohmann::json, sqlite, postgres

Note that using quicktype with nlohmann::json is an elegant way to effectively get C++ reflection.  Once you serialize an object to JSON you can walk all its fields.  Then you can do things like automatically build SQL queries for your classes based on the JSON schema.  Beautiful.

PS. I avoided spotify-json, StaticJson/autojsoncxx, Google Prototype Buffers, Code Synthesis’s ODB, the sqlite JSON1 Extension, C++ reflection libraries like RTTR, lots of code from Stiffstream and Chilkat, etc. because while they are all brilliant and compelling, they bring extra weight.  The world keeps churning though, so keep searching.  Also, there are cases where my choices do not fit, most obviously being cross-platform mobile apps, which will have to be saved for another post… 🙂

gcc seems to be getting smarter all the time. Sentience soon…?

/home/m/development/thedigitalage/AbetterTrader/server/src/TradingModel.cpp:2095:33: note: suggested alternative: ‘ba_’
ss << "WTF account " << pa_->strDesc_ << " sr " << db_id_ << " gain: " ^~~ ba_

After seeing this little gem on the xunitpatterns site:

(We are running all the tests before every check-in, aren’t we?)

I was glad that I baked my tests into the startup of A better Trader. As I build and run while coding, I’m constantly running my regression tests. Steady on.

This is a sweet spot of thread functionality for me at the moment, mixing boost and c++11, so I’m throwing it down here.  It’s also in my Reusable code on git.

// --------------------------------------------------------------------
// CONCISE EXAMPLE OF THREAD WITH EXTERNALLY-ACCESSIBLE STATUS AND DATA
// --------------------------------------------------------------------
// We create a vector, and create a thread to start stuffing it.
// Externally, we can check the status of the job, and have mutex access to the data.
// The atomic job stage is SO CHEAP to change and check, do it all day long as needed.
// Initially, externally, we check the job stage.
// Meanwhile, we do a bunch of intense work inside the mutex.
// Then we do smaller work with frequent mutexing, allowing interruption.
// Great patterns, use them brother!
// Hells to the yeah.
// --------------------------------------------------------------------
std::atomic<int32_t> job_stage(0);
std::unordered_set<int> data;
boost::shared_mutex data_guard;
data.insert(-1);
std::thread t([&job_stage,&data,&data_guard] {
    // stage 1 = jam in data
    job_stage = 1;
    {
        boost::upgrade_lock<boost::shared_mutex> lock(data_guard);
        boost::upgrade_to_unique_lock<boost::shared_mutex> uniqueLock(lock);
        for (int loop = 0; loop < 2000; ++loop)
        {
            std::this_thread::sleep_for(1ms);
            data.insert(loop);
        }
    }
    // stage 2 = mutex-jam data, allowing intervention
    job_stage = 2;
    for (int loop = 3000000; loop < 4000000 && job_stage == 2; ++loop)
    {
        boost::upgrade_lock<boost::shared_mutex> lock(data_guard);
        boost::upgrade_to_unique_lock<boost::shared_mutex> uniqueLock(lock);
        data.insert(loop);
    }
    cout << "thread exiting..." << endl;
});

cout << "pre mutex job stage: " << job_stage << endl;

for (int check = 0; check < 5; ++check)
{
    std::this_thread::sleep_for(200ms);
    // not sure why i was getting std::hex output...
    cout << "check " << check << " job stage: ";
    {
        boost::upgrade_lock<boost::shared_mutex> lock(data_guard);
        cout  << dec << job_stage << " data size " << data.size();
    }
    cout << endl;
}

// We can trigger the thread to exit.
job_stage = 3;

// Let's see what happens if we don't join until after the thread is done.
std::this_thread::sleep_for(300ms);
cout << "done sleeping" << endl;

// NOW we will block to ensure the thread finishes.
cout << "joining" << endl;
t.join();
cout << "all done" << endl;
// --------------------------------------------------------------------

Output:

pre mutex job stage: 1
check 0 job stage: 2 data size 2031
check 1 job stage: 2 data size 225848
check 2 job stage: 2 data size 456199
check 3 job stage: 2 data size 726576
check 4 job stage: 2 data size 936429
thread exiting...
check 5 job stage: 2 data size 1002001
done sleeping
joining
all done

Diamond dependencies are everywhere, from dll dependency hell to running an upgradable linux distro to your node stack to figuring out your #include order.  Check out Titus’s perspective, it’s pretty tight.