I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.

  • 9point6@lemmy.world
    link
    fedilink
    arrow-up
    200
    ·
    6 days ago

    The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

    The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

    If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.

    The man continues to be a malignant moron

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      2
      ·
      edit-2
      6 days ago

      The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

      Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.

      Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.

      https://www.ssa.gov/history/hfaq.html

      Q20: Are Social Security numbers reused after a person dies?

      A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.

      • halcyonloon@midwest.social
        link
        fedilink
        English
        arrow-up
        21
        ·
        6 days ago

        Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.

        In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          18
          ·
          edit-2
          6 days ago

          A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.

          Of all the comments ao far, I find yours the most compelling.

          • Barbarian@sh.itjust.works
            link
            fedilink
            arrow-up
            11
            ·
            edit-2
            6 days ago

            Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:

            SSN_Table

            ID | SSN | Other info

            Other_Table

            ID | SSN_ID | Other info

            When you want to connect them to have both sets of info, it’d be the following:

            SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID

            EDIT: Oh, just to clear up any confusion, the SSN_ID in this simple example is not the SSN itself. To access that in this example query, it’d by SSN_Table.SSN

            • DahGangalang@infosec.pubOP
              link
              fedilink
              arrow-up
              2
              ·
              6 days ago

              Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.

              But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.

              • Barbarian@sh.itjust.works
                link
                fedilink
                arrow-up
                5
                ·
                edit-2
                6 days ago

                It’s necessary to split it out into different tables if you have a one-to-many relationship. Let’s say you have a list of driver licenses the person has had over the years, for example. Then you’d need the second table. So something like this:

                SSN_Table

                ID | SSN | Other info

                Driver_License_Table

                ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info

                Then you could do something like pull up a person’s latest driver’s license, or list all the ones they had, or pull up the SSN associated with that license.

                • Arcka@midwest.social
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  5 days ago

                  I think a likely scenario would be for name changes, such as taking your partner’s surname after marriage.

        • Ephera@lemmy.ml
          link
          fedilink
          English
          arrow-up
          6
          ·
          6 days ago

          The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
          But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person’s name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.

        • snooggums@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          ·
          6 days ago

          It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.

          Hell, I work in a state agency and one of our older databases has a dozen tables with databases.

          • One has the whole thing as a long int: 222333444
          • One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
          • One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
          • One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444

          The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.

      • DahGangalang@infosec.pubOP
        link
        fedilink
        arrow-up
        3
        ·
        6 days ago

        Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!

  • missingno@fedia.io
    link
    fedilink
    arrow-up
    73
    arrow-down
    2
    ·
    6 days ago

    Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.

  • valtia@lemmy.world
    link
    fedilink
    arrow-up
    18
    ·
    edit-2
    5 days ago

    There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.

    Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.

    • DreamlandLividity@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      5 days ago

      Another accusation Elon made was that payments are going to people missing SSNs.

      A much simpler answer is that not all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.

    • DarthKaren@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      5 days ago

      JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.

      • GoodEye8@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 days ago

        Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.

  • SloppyPuppy@lemmy.world
    link
    fedilink
    arrow-up
    34
    arrow-down
    2
    ·
    edit-2
    6 days ago

    As a data engineer for the past 20+ years: There is absolutely no fucking way that the us gov doesnt use sql. This is what shows that he’s stupid not only in sql but in data science in general.

    Regarding duplications: its more nuanced than those statements each side put. There can be duplications in certain situations. In some situations there shouldnt be. And I dont really see how duplications in a db is open to fraud.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      2
      ·
      5 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • ExFed@lemm.ee
        link
        fedilink
        arrow-up
        5
        ·
        5 days ago

        If it’s used as an identifier to link together rows from different tables. Also known as “joining” tables. SSN (with birthdate) is a unique identifier, and so it’s natural to choose as a primary/foreign key.

        • Sparking@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 days ago

          It really is baffling trying to make sense of what he is saying. It’s like the only explanation that makes any sense at all is that he has no idea what he is talking about. Even if he knew just cursory knowledge about database cardinality you wouldn’t say stuff so stupid.

      • abigscaryhobo@lemmy.world
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        5 days ago

        It doesn’t matter without scope. Are we looking at a database of SSNs? tax records? A sign in log? The social security number database might require uniques in some way, but tax records could be the same person over multiple years. A sign in gives a unique identifier but you could be signing in every day.

        It’s like saying a car VIN shows up multiple times in a database. Where? What database? Was it sold? Tickets? Registered every year?

        This is nothing more than a “assume I mean immigrants or tax fraud and get mad!” inflammatory statement with no proof or reason.

  • John Doe@lemmy.world
    link
    fedilink
    arrow-up
    35
    arrow-down
    1
    ·
    edit-2
    6 days ago

    Musk’s statement about the government not using SQL is false. I worked for FEMA for fourteen years, a decade of which was as a Reports Analyst. I wrote Oracle SQL+ code to pull data from a database and put it into spreadsheets. I know, I know. You’re shocked that Elon Musk is wrong. Please remain calm.

    • whoisearth@lemmy.ca
      link
      fedilink
      arrow-up
      11
      ·
      6 days ago

      I work for a crown corp in Canada we have, off the top of my head, about 800 MSSQL, Oracle, MySQL/MariaDB, Postgres databases across the org (I manage our CMDB). Musk is a retard. The world runs on SQL.

      He wouldn’t know this though because he’s a techbro that builds apps with MongoDB b cause he doesn’t understand what normalizing data is and why SQL is the best option for 99.9999999% of applications.

      Fucking idiots.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      5 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

  • darkmarx@lemmy.world
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    1
    ·
    6 days ago

    “The government” is multiple agencies and departments. There is no single computer system, database, mainframe, or file store that the entire US goverment uses. There is no standard programming language used. There is no standard server configuration. Each agency is different. Each software project is different.

    When someone says the government doesn’t use sql, they don’t know what they are talking about. It could be refering to the fact that many government systems are ancient mainframe applications that store everything in vsam. But it is patently false that the government doesn’t use sql. I’ve been on a number of government contracts over the years, spanning multiple agencies. MsSQL was used in all but one.

    Furthermore, some people share SSNs, they are not unique. It’s a common misconception that they are, but anyone working on a government software learns this pretty quickly. The fact that it seems to be a big shock goes to show that he doesn’t know what he is doing and neither do the people reporting to him.

    Not only is he failing to understand the technology, he is failing to understand the underlying data he is looking at.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      7
      ·
      edit-2
      6 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the Vice Bro doesn’t understand how SQL works).

      I’m not aware of any instance where two people share an SSN though. The Social Security Administration even goes as far as to say they don’t recycle the SSNs of dead people (its linked a couple times in other comments and Voyager doesn’t let me save drafts of comments, I’ll make an edit to this comment with that link for you).

      Can you point me to somewhere showing multiple people can share an SSN?

      Edit: as promised: The Social Security FAQ page

      • ryegye24@midwest.social
        link
        fedilink
        English
        arrow-up
        8
        ·
        6 days ago

        Assuming the whole “duplicate SSN” thing isn’t just a complete fabrication, we have no idea what table he was even looking at! A table of transactions e.g. would have a huge number of duplicate SSNs.

        • homicidalrobot@lemm.ee
          link
          fedilink
          arrow-up
          8
          ·
          6 days ago

          The fact that SSN aren’t singular identifiers has been public knowledge for quite a while. ID analytics has shown in over a decade of studies that some people have multiple SSN attached to their name, while some (over five million) SSN are used by three or more living individuals. If you search “ID analytics SSN” you’ll find loads of articles reporting on this dating back to 2010 and a bit before.

      • WarlordSdocy@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        6 days ago

        I mean I don’t know a ton about SQL but one thing to keep in mind about SSNs is they were not originally meant to be used for identification but because we have no form of national id and places still needed a way to verify who you are people just started using SSNs for that since it’s something everyone has and there wasn’t really a better option. So now the government has been having to try and make them work for that and make them more secure. The better solution would be to make some form of national id that is designed to be secure but Republicans and people like Musk would probably call that government overreach or a way to spy and track people.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          2
          ·
          6 days ago

          Ugh, YES, I am so frustrated at the counter arguments for this that I constantly hear spouted by my (ultra-conservative) family.

          I hope that notion re-enters the public consciousness as a part of this (not holding my breath tho)

      • socsa@piefed.social
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        6 days ago

        My wife has a tax payment history under two different legal names which share a single SSN

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          2
          ·
          6 days ago

          Hmmm, well I can’t speak to how the actual databases are put together, so maybe they would have that as two separate unique primary keys with a duplicated SSN.

          But it really seems like bad design if they out it together that way…

          • JoeyJoeJoeJr@lemmy.ml
            link
            fedilink
            arrow-up
            1
            ·
            6 days ago

            Worth noting is that “good” database design evolved over time (https://en.wikipedia.org/wiki/Database_normalization). If anything was setup pre-1970s, they wouldn’t have even had the conception of the normal forms used to cut down on data duplication. And even after they were defined, it would have been quite a while before the concepts trickled down from acedmemia to the engineers actually setting up the databases in production.

            On top of that, name to SSN is a many-to-many relationship - a single person can legally change their name, and may have to apply for a new SSN (e.g. in the case of identity theft). So even in a well normalized database, when you query the data in a “useful” form (e.g. results include name and SSN), it’s probably going to appear as if there are multiple people using the same SSN, as well as multiple SSNs assigned to the same person.

      • kboy101222@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        6 days ago

        I’d imagine the numbers of dead people eventually get cycled around to. 9 digits only gives you 999,999,999 people to go through, and we have over a third of that in existence right now.

  • nednobbins@lemm.ee
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    edit-2
    4 days ago

    It’s so basic that documentation is completely unnecessary.

    “De-duping” could mean multiple things, depending on what you mean by “duplicate”.

    It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.

    It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.

    A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.

  • GaMEChld@lemmy.world
    link
    fedilink
    arrow-up
    38
    arrow-down
    2
    ·
    6 days ago

    Because of course the government uses SQL. It’s as stupid as saying the government doesn’t use electricity or something equally stupid. The government is myriad agencies running myriad programs on myriad hardware with myriad people. My damned computers at home are using at least 2-3 SQL databases for some of the programs I run.

    SQL is damn near everywhere where data sets are found.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      6 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • GaMEChld@lemmy.world
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        6 days ago

        Oh, well another user pointed out that SSN’s are not unique, I think they are recycled after death or something. In any case, I do know that when the SSN system was first created it was created by people who said this is NOT MEANT to be treated as unique identifiers for our populace, and if it were it would be more comprehensive than an unsecure string of numbers that anyone can get their hands on. But lo and behold, we never created a proper solution and we ended up using SSN’s for identity purposes. Poop.

      • aesthelete@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        6 days ago

        SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.

        I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.

        A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.

        There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          1
          ·
          5 days ago

          had to join to it

          I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?

          I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?

          • aesthelete@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            5 days ago

            I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.

            https://www.w3schools.com/sql/sql_join.asp

            Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.

            DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.

            https://en.m.wikipedia.org/wiki/Database_normalization

            In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.

            All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)

  • Nate Cox@programming.dev
    link
    fedilink
    English
    arrow-up
    48
    ·
    6 days ago

    Because a simple query would have shown that SSN was a compound key with another column (birth date, I think), and not the identifier he thinks it is.

  • Garlicsquash@lemmings.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    5 days ago

    Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn’t understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.

  • Hawk@lemmynsfw.com
    link
    fedilink
    arrow-up
    22
    ·
    edit-2
    6 days ago

    Its because the comments he made are inconsistent with common conventions in data engineering.

    1. It is very common not to deduplicate data and instead just append rows, The current value is the most recent and all the old ones are simply historical. That way you don’t risk losing data and you have an entire history.
      • whilst you could do some trickery to deduplicate the data it does create more complexity. There’s an old saying with ZFS: “Friends don’t let friends dedupe” And it’s much the same here.
      • compression is usually good enough. It will catch duplicated data and deal with it in a fairly efficient way, not as efficient as deduplication but it’s probably fine and it’s definitely a lot simpler
    2. Claiming the government does not use SQL
      • It’s possible they have rolled their own solution or they are using MongoDB Or something but this would be unlikely and wouldn’t really refute the initial claim
      • I believe many other commenters noted that it probably is MySQL anyway.

    Basically what he said is incoherent inconsistent with typical practices among data engineers to anybody who has worked with larger data.

    In terms of using SQL, it’s basically just a more reliable and better Excel that doesn’t come with a default GUI.

    If you need to store data, It’s almost always best throw it into a SQLite database Because it keeps it structured. It’s standardised and it can be used from any programming language.

    However, many people use excel because they don’t have experience with programming languages.

    Get chatGpt to help you write a PyQT GUI for a SQLite database and I think you would develop a high level understanding for how the pieces fit together

    Edit: @zalgotext made a good point.

    • zalgotext@sh.itjust.works
      link
      fedilink
      arrow-up
      8
      ·
      6 days ago

      Great explanation, but I have a tiny, tiny, minor nit-pick

      Basically what he said is incoherent to anybody who has worked with larger data.

      I’m being pedantic, but I disagree with your wording. As a backend dev, I work with relational databases a ton, and what Musk said wasn’t incomprehensible to me, it just sounded like something a first year engineer fresh out of college would say.

      Again, the rest of your explanation is spot on, absolutely no notes, but I do think the distinction between “adult making up incomprehensible bullshit” and “adult cosplaying as a baby engineer who thinks he’s hot shit but doesn’t know anything beyond surface level stuff” is important.

    • turtle [he/him]@lemm.ee
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 days ago

      There’s an old saying with ZFS: “Friends don’t let friends dedupe”

      That’s a bad example to reference. The ZFS implementation of deduplication is poorly thought out, and I say that even though I like and run ZFS on my own Linux server(s). I understand that the BTRFS implementation of dedupe works well (no first-hand experience), and the Windows one works great (first-hand experience).

      • Hawk@lemmynsfw.com
        link
        fedilink
        arrow-up
        1
        ·
        5 days ago

        I’ve had a poor experience with btrfs dedupe tbh (and a terrible experience with qgroups), however, this was years ago. Btrfs snapshots I prefer though, much easier not to have that dependence.

        What distro are you using for ZFS, void?

    • finitebanjo@lemmy.world
      link
      fedilink
      arrow-up
      2
      arrow-down
      16
      ·
      6 days ago

      It was a great answer until the very last sentence. ChatGPT is never a reference for anything ever if you have any fraction of a brain.

      • Hawk@lemmynsfw.com
        link
        fedilink
        arrow-up
        4
        arrow-down
        2
        ·
        6 days ago

        I disagree, it’s just a tool. It’s a fantastic way to template applications very quickly, particularly for those who are not already familiar with technologies and may not have the time or opportunity to play around with things otherwise.

        Llm is not a search engine and it can produce awful code. This is not production code, it’s for tinkering. As a sandbox tool, LLMs are fantastic.

        On the ethical side of things, yeah openAI sucks, Qwen2.5 would be up to this task, one can run that locally.

        • finitebanjo@lemmy.world
          link
          fedilink
          arrow-up
          3
          arrow-down
          3
          ·
          6 days ago

          It’s a disinformation machine which completely lacks all context. If it’s about 85% accurate to average internet denizens and 15% halucination, then it’s an absolutely atrocious source to learn from. You’re literally lying to yourself, that is what the tool does.

          • Hawk@lemmynsfw.com
            link
            fedilink
            arrow-up
            3
            arrow-down
            1
            ·
            6 days ago

            Well Ive ad a great time using LLMs to sandbox a dozen implementations and then investigate the shortcoming and advantages of different implementations.

            Mistakes happen a lot but they can be managed on a small MWE with a couple of tests.

            It’s how the tool is used more than any given tool being bad.

            I understand your point and you’re not wrong. However, I’m not wrong either and you should take a second look at how you might use these tools in a way that makes your life easier and addresses the valid limitations you’ve described.

  • Sparking@lemm.ee
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    5 days ago

    It’s an insanely idiotic thing to say. Federal government IT is myriad, and done at a per agency level. Any relational database system, which the federal government uses plenty of, uses SQL in one way or another. Elon doesn’t know what he is talking about at all, and is being an ultimate idiot about this. Even in the context of mainframe projects thatif we are giving elong the benefit of doubt about referring to, most COBOL shoprbibknow have adapted to addressing internal data records using an SQL interface, although obviously in that legacy world it is insanely fractured and arcane.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      6
      ·
      5 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • Sparking@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 days ago

        Another commentor pointed out a legitimate use case, but it’s not even worth thinking about that much. De-duplocated is usually a word you use in data science to talk aboutakong sure your dataset is “hygienic” and that you aren’t duplicating data points. A database is much different because it is less about representing data, and more about storing it in a way that allows you to perform transactions at scale - retrieval, storage, modification, etc. Relational databases are analyzed in terms of data cardinality which essentially describes tradeoffs in representation between speed of retrieval (duplications good) vs storage efficiency (duplications bad).

        The issue is that Elon is so vague and so off the mark that it is very hard to believe that he even has the first clue about what he is a talking about. Even you are confused just by reading it. It is all a tactic to convince others that he is smarter than he is while doing extreme damage to the hardworking people that actually make this stuff possible. Have you noticed that the man has never come to a conclusion that wasn’t in his interests? This is not honest intellectualism, or discussion based on technical merit. It’s self serving propaganda.

  • knightly the Sneptaur@pawb.social
    link
    fedilink
    arrow-up
    29
    arrow-down
    4
    ·
    edit-2
    6 days ago

    To oversimplify, there are two basic kinds of databases: SQL (Structured Query Language, usually pronounced like “sequel” or spelled aloud) and noSQL (“Not Only SQL”).

    SQL databases work as you’d imagine, with tables of rows and columns like a spreadsheet that are structured according to a fixed schema.

    NoSQL includes all other forms of databases, document-based, graph-based, key-value pairs, etc.

    The former are highly consistent and efficient at processing complicated queries or recording transactions, while the latter are more flexible and can be very fast at reads/writes but are harder to keep in sync as a result.

    All large orgs will have both types in use for different purposes; SQL is better for banking needs where provable consistency is paramount, NoSQL better for real-time web apps and big data processing that need minimal response times and scalable capacity.

    That Musk would claim the government doesn’t use SQL immediately betrays him as someone who is entirely unfamiliar with database administration, because SQL is everywhere.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      7
      ·
      6 days ago

      Just so I’m clear, you’re implying that a given SSN could appear associated to multiple “keys” because the key-value pair in a NoSQL database could have complex data.

      An example I can imagine is a widow collecting her dead husband’s Social Security. Her SSN could appear in her own entry and also in her dead husband’s as a payee of that benefit, thus appearing as a “duplicate” SSN.

      Is that in line with what you’re saying?

      • knightly the Sneptaur@pawb.social
        link
        fedilink
        arrow-up
        7
        ·
        edit-2
        6 days ago

        Indeed, that’s a possibility, but I’m not privy to the structure of the social security administration’s databases so I couldn’t say if it was indeed the case.

        The deeper point being, if the government has any databases at all, then some form of Structured Query Language is being used to read and write it.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          5
          ·
          6 days ago

          Thats how I feel too.

          Lol, I’d love to see the data hes trying to speak about (not that that’d be any kind of concerning for privacy /s). I don’t think he’s outright lying, but it definitely feels like a misrepresentation / wrong conclusion from the data.

          But thanks for your part in helping me understand all this!

  • SolidShake@lemmy.world
    link
    fedilink
    arrow-up
    18
    ·
    6 days ago

    How come republicans keep saying that doggy is going to expose all the fraud in the government but yet the biggest fraud with 37 felonies is president? What the actual fuck to these people think?

  • Honytawk@lemmy.zip
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    2
    ·
    6 days ago

    He is saying the US government doesn’t use structured databases.

    At least 90% of all databases have a structure.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      5 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • Sparking@lemm.ee
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 days ago

        As someone explained in another comment, you often duplicate information due to rules around cardinality to gain improvements in retrieval an. structure. I would be pretty worried if SSSNs were being used as a a widepread primary key in any set of tables - those should generally be UUIDs that can be optimized for gashing while avoiding collisions.

        Even if we are being generous to Elon, we could assume that social security payments are processed on mainframes given how many have to go out and the legacy nature of the program. Most mainframe shops I know have adapted an SQL interface for records in some capacity, but who knows what he is looking at.

        Government federal IT is done at a per agency basis. I would say oracle database is pretty much the most licensed piece of software the government does use outside of Redhat Linux and windows desktop.