Aug 25, 2013 at 10:32 PM
Edited Aug 27, 2013 at 6:48 AM
Below are additional thread exchanges...
<From Anonymous, Sunday Aug 25, 2013>
Thank you very much for your response Gerardo. So let me see if I understand this..
A Btree node obviously can have many Keys/Value (B-tree definition). Each key\value is stored in a DataBlock. Since there are many key/values pairs in this one btreenode there are many DataBlocks that are linked together in one file?
Let's say I have the following code just to keep things simple:
SortedDictionaryOnDisk sd = new SortedDictionaryOnDisk();
How would this data save to this disk? Please include as much details as you can? I appreciate your help very much and would love you use your code but I need to understand it first.
On 08/25/13, firstname.lastname@example.org
1.) Yes, you got it right, Node will be stored in 1 or more DataBlocks, depending on the selected DataBlock size.
2.) In your example, DataBlock will contain:
- All User items (your key/value list as shown above)
- Filler items up to BTreeNode.SlotLength count: filler items are key/value pairs occupying an identified amount of space computed based on user entries. In the above simple data type example, a key of 4 bytes + value size of 5 bytes will be allocated per
filler item by the Btree Manager.
NOTE: Concept of Fillers prevent fragmentation. BTreeManager uses sampling techniques to determine filler data size for Key and Value.
- Other B-Tree for internal use information
On Aug 25, 2013, at 2:32 PM, Anonymous wrote:
Thank you, I realize I'm asking many questions.
I have a really stupid one..why not store all the contents of an item in only one datablock and make the length of each datablock variable length depending on the item it is storing?
These are good Qs. ;)
About the question of why not use a variable sized DataBlock so an Item (e.g. - BTree Node) can be stored in one block... 'presuming Anonymous' point is to achieve a more optimal reading. This is a good point.
A. Short response on email...
DataBlock recycling will be messy/slow if blocks are variable sized. A lot of operations in Sop are done mathematically, and avoids disk I/O if possible. These kind if slns, e.g. Uniform block sizes, made it possible.
However, Sop reads a set of contiguous Blocks in 1 async read ("swipe"). So, nothing was lost performance wise w the current sln.
B. More complete answer...
** At higher/user level, it can be thought that one (BTree Node) item occupies a one logical variable sized block on Disk. DataBlock is a low-level construct allowing Algorithm format abstractions and providing contiguous-ness of Store's Data Segments, as mentioned.
Business logic/User code is provided with BinaryReader & BinaryWriter to focus on persisting data contents of the Object (using Streams) and not to deal with how raw data is laid out across DataBlocks. SOP had this abstracted at the Business/User level.
** At low-level in DataBlocks layer...
SOP will perform slower & algorithms more complicated to implement in order to account for uneven block sizes, if at all possible to accomplish them. By keeping block sizes uniform, 'kept everything simple and made possible for a fragmentation-less Data
Segments. AND overall, the business logic to perform optimally, e.g. - data block recycling, transaction meta change tracking, mathematical disk address/offset calculations, etc...
However, specifically, DataBlock has details allowing business logic (e.g. - B-Tree) to write & read a set of contiguous Blocks belonging to a (BTreeNode) Item in minimal (async) operations if not possible in one due to memory restrictions. So, nothing
was lost performance wise.