I have a JSON object with a huge array of nested objects. Let us assume it consists of records of license plates for vehicles. It would contain necessary fields like licenseID, issuingState, dateOfIssue, driverID etc.
What I am having problem with is how I should store data that is only used for exceptional cases, like a field for representing if the license plate is for foreign embassies (isEmbassyOwned) or if it is owned by a government entity (isGovernmentOwned) or if it is a learner license (isLearner) etc alongside fields with data types other than Boolean which would be empty or 0 and likewise when there is no information on that field. Let it be known that these exceptional scenarios would occur in less than 10% of total object instances.
I am facing confusion as to what format would be best for storing such type of data keeping balance between minimizing storage consumption and being human readable. Should I declare the fields for all objects regardless or only include them when they are not empty? Should I store them in a dedicated array instead, or maybe just introduce some code value to be used by a switch case operator in the interpreter? Or is there some other implementation I am not aware of?
IMO if you’re even slightly concerned about storage you should be using a DBMS instead of JSON files. They will handle sparse data, compression, and fast access better than a text-based file format.
If its something that represents mutually exclusive states, like the license plates examples (Gov’t, Embassy, Learner), an enum like 4wd mentioned is a better idea than many boolean keys. This would also be the switch/case question you posed. For a “regular case”, I would include that in the enum, but if you create an enum that only contains “special cases”, you can always set it to null.
On the case of booleans, I would suggest avoiding them unless it is necessary, and truly a binary (as in, two-option, not binary numbers), self-contained-in-one-key thing (obligatory anti-boolean video). If the use case is to say what a different key’s object represents, you don’t need it (see: enums. You’ll thank yourself later if you add a third option). If the use case for using it is saying another key contains value(s), you don’t need it. Many languages can handle the idea of “data is present, or not present” (either with “truthy/falsey” behavior interpreting “data-or-null”, or “Maybe/Option” types), so often “data-or-null” can suffice instead of booleans.
I would suggest trying to always include all keys of a present object, even if it’s value is null or not applicable. It will prevent headaches later when code might try to access that key, but it isn’t present. This approach might also help you decide to reduce the quantity of keys, if they could be consolidated (as in taking booleans and converting to a state-like enum, as mentioned above), or removed (if unused and/or deprecated).
Though I know very little of enum and never used it before, I think this is what I needed. I couldnt imagine there would exist a type exactly for this purpose since I could consider adding or deprecating data later in time. I would need time understanding how I need to restructure the current JSON object to accomodate enums, but I think it will be worth it. Thanks for you time!
When the enum reaches your JSON, it will have to be a string (as JSON does not have a dedicated “enum” type). But it at least ensures that languages parsing your JSON will should have a consistent set of strings to read.
Consider this small bit of Elm code (which you may not be an Elm dev, and thats okay, but it’s the concept that you should look to get):
-- A Directions "enum" type with four options: -- North, East, South, West type Directions = North | East | South | West -- How to turn each Directions into a String -- Which can then be encoded in JSON directionsToString : Directions -> String directionsToString direction = case direction of North -> "north" East -> "east" South -> "south" West -> "west" -- "Maybe Directions" since not all strings can be parsed as a Directions. -- The return will be "Just <something>" or "Nothing" directionsFromString : String -> Maybe Directions directionsFromString dirString = case dirString of "north" -> Just North "east" -> Just East "south" -> Just South "west" -> Just West _ -> Nothing
The two functions (directionsFromString and directionsToString) are ready to be used as part of JSON handling, to read a String from a key and turn it into a Directions enum member, or to turn a Directions to a String and insert the string to a key’s value
But all that aside, for your restructuring, and keeping with the license plate example, both type and license number could be contained in a small object. For example:
{ ... "licensePlate": { "type": "government" <- an enum in the language parsing this but a string in JSON "plateNumber": "ABC123" ... } ... }
… why does it need to be json?
What about using enums? In this case you will have to specify them for all records, but this ensures that the field will always be present.
enum license_owner { regular_citizen = 0, embassy, government, ... }
Ive heard about enums before, but I never really paid attention to them since I never got a need to use them in any of my projects till now. I think this is exactly what I need. Ill research more on it
Thank you so much for your help
If they are mutually exclusive special cases, using an enum like another comment mentioned makes sense, and can limit the special cases to one field. You can use an enum of strings if you want it to be more readable.
As for how the data is represented, only including the special case field when there is one makes sense as well. Keep in mind JSON is also a flexible format - you can even have the array contain mixed types, like strings for simple licenses, and objects for more complex licenses. That can reduce the size of the JSON document quite a bit, if that’s an option.
Depending on your needs you can also break it into a columnar format with some standard compression on top. This allows you to search individual fields without looking at the rest.
It also compress exceptionally well, and “rare” fields will be null in most records, so run length encoding will compress them to near zero
See fx parquet
If storage space is important using uncompressed json is a bad choice, if you’re compressing the json it doesnt really matter if you have lots of
exceptionCase: False
fields as they will compress very well.