DataBlocks Details

Aug 24, 2013 at 10:00 PM
Edited Aug 24, 2013 at 10:13 PM
This is an email thread from an anonymous SOP enthusiast, 'hopefully the thread content can be helpful for others... ;) Pls. see the responses on the end. Enjoy!

Hi Gerardo,
I was just reading about your SOP framework. First of all I think it is very nicely designed and you did a more than great job. I am currently trying to create a very simple B-Tree implementation for my company. I'm really stuck at how to store the data on disk. I'm been looking at the DataBlock object and need some more information.
  1. Is a datablock created for each object I save in the framework?
  2. Can you please explain for datablocks work a little bit? Is only one file required for all the objects?
I would appreciate it very much.

Thank you

Gerardo's Response:
Thanks much for the good compliments on Sop. Enjoy!

Here are my replies to your Qs:
1.) by default no, each object is saved part of the Btree Node, which occupies a contiguous space on disk, which typically is a set of DataBlocks.

However, a Btree store that dedicate a (set of) DataBlock(s) for user object is supported, typical scenario this is used is for Blobs or objects that require large storage space.

2.) DataBlock is the smallest unit of storage item on disk, which contains reference to another block (blocks are linked items) and user data. Business logic such as BtreeManager uses a list of blocks to store an object's data, e.g. Btree Node. And it can create a higher level of storage format for persisting its Objects.
Yes, one file can store all objects. File has ObjectStore that manages user object storage.
Optionally, user can choose to use different ObjectStores per File. ObjectStores are hierarchical meaning each can manage ObjectStores as items.
Aug 25, 2013 at 10:32 PM
Edited Aug 27, 2013 at 6:48 AM
Below are additional thread exchanges...

<From Anonymous, Sunday Aug 25, 2013>

Thank you very much for your response Gerardo. So let me see if I understand this..

A Btree node obviously can have many Keys/Value (B-tree definition). Each key\value is stored in a DataBlock. Since there are many key/values pairs in this one btreenode there are many DataBlocks that are linked together in one file?

For example:
Let's say I have the following code just to keep things simple:

SortedDictionaryOnDisk sd = new SortedDictionaryOnDisk();

How would this data save to this disk? Please include as much details as you can? I appreciate your help very much and would love you use your code but I need to understand it first.

Thanks again.

On 08/25/13, wrote:

Replies below:
1.) Yes, you got it right, Node will be stored in 1 or more DataBlocks, depending on the selected DataBlock size.
2.) In your example, DataBlock will contain:
  • All User items (your key/value list as shown above)
  • Filler items up to BTreeNode.SlotLength count: filler items are key/value pairs occupying an identified amount of space computed based on user entries. In the above simple data type example, a key of 4 bytes + value size of 5 bytes will be allocated per filler item by the Btree Manager.
    NOTE: Concept of Fillers prevent fragmentation. BTreeManager uses sampling techniques to determine filler data size for Key and Value.
  • Other B-Tree for internal use information

On Aug 25, 2013, at 2:32 PM, Anonymous wrote:

Thank you, I realize I'm asking many questions.

I have a really stupid one..why not store all the contents of an item in only one datablock and make the length of each datablock variable length depending on the item it is storing?

Thanks again

(Gerardo response)

These are good Qs. ;)

About the question of why not use a variable sized DataBlock so an Item (e.g. - BTree Node) can be stored in one block... 'presuming Anonymous' point is to achieve a more optimal reading. This is a good point.

Gerardo's answers...
A. Short response on email...
DataBlock recycling will be messy/slow if blocks are variable sized. A lot of operations in Sop are done mathematically, and avoids disk I/O if possible. These kind if slns, e.g. Uniform block sizes, made it possible.
However, Sop reads a set of contiguous Blocks in 1 async read ("swipe"). So, nothing was lost performance wise w the current sln.

B. More complete answer...
** At higher/user level, it can be thought that one (BTree Node) item occupies a one logical variable sized block on Disk. DataBlock is a low-level construct allowing Algorithm format abstractions and providing contiguous-ness of Store's Data Segments, as mentioned. Business logic/User code is provided with BinaryReader & BinaryWriter to focus on persisting data contents of the Object (using Streams) and not to deal with how raw data is laid out across DataBlocks. SOP had this abstracted at the Business/User level.

** At low-level in DataBlocks layer...
SOP will perform slower & algorithms more complicated to implement in order to account for uneven block sizes, if at all possible to accomplish them. By keeping block sizes uniform, 'kept everything simple and made possible for a fragmentation-less Data Segments. AND overall, the business logic to perform optimally, e.g. - data block recycling, transaction meta change tracking, mathematical disk address/offset calculations, etc...

However, specifically, DataBlock has details allowing business logic (e.g. - B-Tree) to write & read a set of contiguous Blocks belonging to a (BTreeNode) Item in minimal (async) operations if not possible in one due to memory restrictions. So, nothing was lost performance wise.

Thx much.