Quite frequently I come across scanned books that are viewable for free online. For example, the publisher put them there (such as preview chapters), a library (old books from their collection that are in public domain), etc. Since I like hoarding data, and the online viewers that are used to present the book to me might not be very practical, I frequently try to download the books one way or another. This requires toying with the “inspect element” tool and various other methods of getting the images/PDF. Now, all that I access is what is, well, accessible; I don’t hack into the servers or something. But - the stuff is meant to be hidden from the normal user. Does that act of hiding the material, no matter how primitive and easily circumvented, mean that I’m not allowed to access it at all?

I suppose ripping a public domain book is no big deal, but would books under copyright fare differently?

Mainly I’m asking out of curiosity, I don’t expect the police to come visit me for ripping a 16th century dictionary.

Note: I live in EU, but I’d be curious to hear how this is treated elsewhere too.

Edit: I also remembered a funny trick I noticed on one site - it allows viewing PDFs on their website, but not downloading, unless you pay for the PDF. But when you load the page, even without paying, the PDF is already downloaded onto your computer and can be found in the browser cache. Is it legal to simply save the file that is already on your computer?

  • simple@lemm.ee
    link
    fedilink
    arrow-up
    38
    ·
    2 months ago

    AFAIK web scraping (the act of grabbing and downloading any data you see available on the internet) isn’t illegal, and I would assume downloading PDFs provided to you online would fall under that. Since it is copyrighted it would probably be illegal to share it, though.

    • nvermind@lemm.ee
      link
      fedilink
      arrow-up
      22
      arrow-down
      1
      ·
      edit-2
      2 months ago

      This. In a case around LinkedIn courts ruled that in the US it’s legal to scrape publicly available data. The company doing the scraping was selling that data to corporate customers, but ultimately use might depend on the information you’re accessing and under what permissions. (Not a lawyer)

  • Vipsu@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    2 months ago

    According to the big tech its ok if you’re training large language model with it.

  • ulterno@lemmy.kde.social
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    2 months ago

    viewable for free online

    If you are viewing it on your computer, you have already downloaded it.
    Don’t let anyone tell you otherwise.

    already downloaded onto your computer and can be found in the browser cache

    Exactly.

  • aaaaace@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    15
    ·
    2 months ago

    Ask the AI companies who scraped my sites while the media companies were DCMA-ing everything in sight and working with enforcement paid for with publuc funds to prosecute/persecute the “pirates”.

    • AwkwardLookMonkeyPuppet@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      2 months ago

      It’s ridiculous that Homeland Security is spending resources taking down pirate sites. That’s a department specifically created to prevent terrorism, and instead they’re operating as Pinkertons for broadcasting companies.

  • Excrubulent@slrpnk.net
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    1
    ·
    edit-2
    2 months ago

    I’d say if the copyright holder says you’re not allowed to then you’re not. It’s piracy.

    People will tell you that you’ve already downloaded the data so saving it is fundamentally, technically no different, but that doesn’t matter to the law, it’s still piracy.

    Like yeah, it’s absurd and pointless and anti-consumer and anti-knowledge and unenforceable and unsustainable, but that’s copyright. It’s always been that way.

    Copyright destroys culture and piracy is our ethical duty in the face of that. The only reason to care about it is so you don’t get caught.

      • Excrubulent@slrpnk.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Sure, and I’d say that’s piracy too. I wouldn’t mind if it wasn’t also being siloed into private hands to enrich the wealthy and screw the rest of us.

  • Etterra@lemmy.world
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    2 months ago

    It might be illegal to post it without permission, but you can download it all you damn well please and they can’t stop you. Unless it’s like government top secret something or other. In that case you probably don’t want it anywhere near your computer and should probably tell somebody where you found it.

    • ulterno@lemmy.kde.social
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 months ago

      should probably tell somebody where you found it

      Somebody, as in your lawyer. Who can then inform the correct authorities, while making sure you don’t become their scapegoat.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    5
    ·
    2 months ago

    Depends on where you are. Usually if it’s a legal source, you can save it. But you’re not supposed to share it unless given permission. If you downloaded it from a source that’s not legal, things might change, depending on the specifics of your law.

    • Excrubulent@slrpnk.net
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      2 months ago

      Right click -> inspect element (Q) works.

      You can also press F12.

      And if right click is blocked, on Firefox holding SHIFT will unblock right click. There is also a plugin that does this for you.

      Often websites will put an invisible element in front of the content to intercept this trick, but you can navigate through the elements to find the one they were trying to obfuscate.

      • Buddahriffic@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        2 months ago

        Also you can just block elements you right click on in Firefox (though this might be an option added by an add-on). If there’s hidden elements you just need to go through each of those until you can click on the one you want directly (and you can tell by what is highlighted in the inspect element mode).

        You can also hit delete in inspect element mode to remove that element. You can also edit whatever you want in the element. Makes me wish it existed back when I was doing more web dev work, would have made things a lot easier when debugging.

    • antonim@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      2
      ·
      2 months ago

      (Sorry for the late response.) Well it depends a lot on the site. Since I focus on books and scholarly articles, the ideal way is to find the URL of the original PDF. The website might show you just individual pages as images, but it might hide the link to the PDF somewhere in the code. Alternatively, you might just obtain all the URLs of the individual page images, put them all into a download manager, and later bundle them all into a new PDF. (When you open the “inspect element” window, you just have to figure out which part of the code is meant to display the pages/images to you.) Sometimes the PDFs and page images can be found in your browser cache, as I mention in the OP. There’s quite some variety among the different sites, but with even the most rudimentary knowledge of web design you should be able to figure out most of them.

      If need help with ripping something in particular, DM me and I’ll give it a try.

  • schnurrito@discuss.tchncs.de
    link
    fedilink
    arrow-up
    3
    ·
    2 months ago

    If it’s in the public domain, it’s almost certainly legal. I don’t have the general answer to your question.

    Really this question shows how outdated copyright law is; in many countries it prohibits “copying”, but in the age of computers nearly all accessing of information involves “copying” it in some way.

  • orcrist@lemm.ee
    link
    fedilink
    arrow-up
    2
    ·
    2 months ago

    If something is in the public domain, there is no copyright covering it, so you should make as many copies as you feel like. Many public domain books are posted on the Internet Archive, where you can easily download them in various formats. Then you won’t have to work hard to get the data. Public domain artwork, likewise, is often available on Wikimedia Commons.

  • oxjox@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    2
    ·
    2 months ago

    Digital tools as you’ve described could be used by the service to manage access to content. A book’s author or publisher may object to the book being available for free. There may be limits on the amount of time you can read a book. Some content may be public domain but there may be versions of that content which the publisher has altered to in some way making some portions of the book not public domain.

    Knowingly possessing something that was not freely provided to you or the public by the licensed owner, or otherwise known to be unprotected by copyright, is not legal. Just because a file is cached on your device does not mean you are the legal owner of that content forever.

    There’s a number of reasons you may be charged to download a pdf. It could be a means of legally granting ownership and sharing revenue with the content owner. It could be because the authorized provider of the content is simply charging to maintain the service you’ve acquired the content from. It could be both or it could be a sketchy website just trying to get your CC info.

    This is coming from the perspective of someone in the US. I’m not sure about the rest of the world but imagine basic copyright laws are similar around the world.

    • antonim@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      2
      ·
      2 months ago

      Honestly much of your reply is confusing me and doesn’t seem to be relevant to my questions. This is what I think is crucial:

      Just because a file is cached on your device does not mean you are the legal owner of that content forever.

      What does being “the legal owner forever” actually entail, either with regards to a physical book or its scan? And what does that mean regarding what I can legally do with the cached file on my computer?

      • oxjox@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 months ago

        If you have legally obtained something, you have agreed to the terms of ownership with the provider / owner / creator of the content. Whether you find a document on your computer or you have paid for it, it does not explicitly give you full ownership of that data forever.

        For example: if you buy a DVD from a store, you’re actually purchasing a license to watch the content of that DVD. If you were to give or sell that disc to someone else, you are transferring your permission to watch that disc to them. So, if you were to rip that movie to your computer, legally - you only have permission to watch that for as long as you are in possession of that physical media.
        Conversely, if you were to “buy” a movie from an online platform, they may relinquish your right to watch that movie if the publisher of that content (or a government agency) no longer permits them to stream that content to you. If you were to download that movie, that does not change the agreement you made with the service to watch it. This is why it’s not possible to save an iTunes video purchase to your computer in a non-encrypted format.

        In other words, you’ve got to read the terms and conditions. Even then, they may change the terms and conditions of the agreement.