03/06/2026 / 16 minutes

AI, original accumulation, autonomous infrastructures

This a conversation we had during Aaron Swartz Day, November 2025

By f from Sutty, with contributions from PIP P) pirates.

Updates

March 6th, 2026: Added mention to Coopcloud and Anubis support.

This is research that is just beginning, no conclusions yet reached, based on conversations we’ve been having and conversations we’ve been seeing happen elsewhere, trying to determine what our place is in them.

What’s happening now with capitalism is that we’re seeing companies competing for who can come up with the best model of artificial intelligence.
This involves the construction of data centers, based on colonial models like the genocide in Sudan for the control of coltan and other “rare earths”, colonial extractivism in Africa and Abya Yala, the persecution and assassination of land defenders, testing it in “the markets” of the social control of networks and perfecting it in other genocides, in particular the Palestinian genocide, such as Palantir, soon to be implemented in Argentina and Latin America.

The construction of data centers involves the destruction of water and land, as our comrades at Tu Nube Seca mi Rio closely follow. It works in very similar ways to other extractivist projects that have already been facing years of struggle and resistance.
We affirm that these struggles against mining, against monocultures, for land and life are intertwined and overlapping and we must stand in solidarity with each other. Even though no one understands how AIs work, we all know the effects of environmental destruction by capitalist corporations, foreign or local, the empty promise of job creation, polluted land and water, etc.

In this article, we will attempt to shine light on another facet, which has to do with the colonization of the Internet and cyberspace as a field of struggle, in particular with the surprising appearance of thousands of simultaneous visits to autonomous servers, which self-host services by and for small communities (compared to the billions of users of Instagram, X, Facebook and TikTok, at least). Visits that exhaust the few resources they have and force them to close projects or spend a lot of time trying to combat them and that are automatically generated by crawlers, programs that download entire sites, in this case to process them and train artificial intelligence models.

As a first working hypothesis or historical analogy we believe we are looking at a new process of original accumulation, like the one that allowed the establishment of capitalism by accumulating land ownership in the hands of few, while dispossessing peasants, turning them into urban proletariat. Not to mention the appropriation of knowledge turned into industrial processes, such as the recently re-enabled case of the Jamaican foundries that gave way to modern processing of iron, appropriating processes developed by slaves. These were and are violent processes of social re-accommodation and concentration of capital. Our criticism of liberal defense of “free culture” is that they actively avoid this history of struggle, using more washed up terms such as “enclosure” of knowledge to refer to the same historical process, separating us when we should recognize ourselves as part of the same movement.

We then ask ourselves what is the current process of dispossession that gives
continuity to this analogy in the concrete case of the race for AI. As providers of web hosting, and reading and listening to others in the field of autonomous infrastructures, self-hosting, algorithmic disobedience, and digital gardens, what we experience is a growing and worrying avalanche of visits to our servers, downloading all the information published on our sites.

This is something that has been happening forever, from search indexers like Google and others to the Internet Archive’s Wayback Machine. None of these services could work without downloading our sites. For the sake of visibility and collective memory, we often allow it.

The escalation is that now our servers receive thousands of hits from many different addresses, which is made clear with a little bit of analysis are automated, but which are difficult to detect individually. Many of these crawlers do not identify themselves, nor do they respect limits like those of robots.txt. They just visit us again and again, wear us down and take us off the Internet.

Following our initial analogy, they force us to move from our own servers to larger providers such as Cloudflare, who have the ability to limit or block them, or even negotiate between companies. This accelerates the concentration of resources in these few providers, generating new web oligopolies.

We had this conversation in November 2025, 12 years after the suicide of Aaron Swartz, persecuted for hacking with the aim of freeing as much scientific research as he could, trapped in the capitalist circuit of scientific journals. Without going into whether this is the kind of knowledge that should be shared or not, the contradiction is that Meta, OpenAI and others have done the same, downloading terabytes and terabytes of books, papers, websites, movies (including porn) and anything else they can, “violating copyright”, without any of the consequences that Aaron and other pirates and hackers have suffered for years. They granted themselves a letter of marque; as always, it’s a question of power.

The carbon footprint of the web

In parallel, we have been following other conversations about the environmental impact of the Internet itself. Some good practices for web developers like us have been proposed by Sustainable Web Design, in particular an algorithm that allows us to estimate CO2 emissions from the data transfer emissions involved in visiting a website. This model uses the energy consumed per byte transferred as a variable in order to estimate the CO2 emitted in producing it. The model takes into account the country of origin of the visit, the energy consumed by the visitor’s device, the network and the server, based on the geographic location of the server.

At Sutty we modified this model a bit to be able to separate the CO2 emitted by the visitor and the CO2 emitted by our infrastructure, based on the country of each.

Based on this, we are currently calculating the CO2 emissions of our live servers. The question that arises is what to do with this information. One thing we can do is use it to have a more concrete idea of our own environmental impact, making it visible to users and visitors.

We can look at other experiences that allow us to visualize this impact and compare it to hamburgers consumed, where two contradictions arise. One is that comparing web visits with hamburgers is a very very very very very very gringo unit of comparison --have you seen the meme that gringos need to have things explained to them in hamburgers? The other contradiction is that it puts the focus back on individual responsibility for environmental impact (“eat less hamburgers”) instead of where the real responsibility lies (the industrial, colonial, patriarchal, Western, capitalist mode of production), in the same way that a dripping faucet does not compare to the amount of water polluted by an open pit mine or, for that matter, a data center.

And in that sense, we believe we can keep a record of our supposed environmental impact, contextualized to a worker cooperative based in a colonial and colonized country, which relies on infrastructure and data centers in the United States. Which leads us to say that the problem lies elsewhere, without washing our hands of it.

Could we compare our environmental impact and say that we emit 0.0000000000001% of what a U.S. bomb dropped by Israel on Gaza does? Or the emissions per second of Vaca Muerta, or any extractivist, genocidal, terricidal project: can we know this information?

This relates to the hypothesis of original accumulation in that these CO2 emissions are unnecessary and produced by a capitalist race which we may not want participate in, or at least decide what part we want to play and from which stance we are going to resist.

Current strategies

Outsourcing

If we host our own servers and all of this is highly time-consuming, the easiest thing to do would be to go to Cloudflare and let them take care of it. The problem, as we mentioned, is the concentration of web distribution services in a few providers, repeating the hegemonic “scale” model in technology development.

There are other options, such as Deflect.

These third parties can also implement the following strategies.

Block or limit

Another option is to block them completely, there being several ways to go about this. One is robots.txt, oriented to the crawlers that follow standards, informing them that we prefer that they do not visit us.

In Sutty we are evaluating the option of allowing users to activate a robots.txt file.

But the most problematic crawlers are those who neither identify themselves nor respect our requests, and go as far as to do everything they can to prevent us from identifying them. In this case there are comrades blocking by ASN number, because even though visits from a single crawler can come from many IP addresses, they all belong to the same group and owner, identified by ASN.

Sutty is also implementing this strategy, incorporating ASN identification in our visitor logs, which allows us to group the origin of the visit without having to register the IP address --which would allow us to individualize the address and potentially de-anonymize visitors. As these groupings have an owner and that record is public, it is possible to group many visits, maybe individual, maybe not, by their owner. With this we can analyze the logs and know which sets of visits are produced from a network that is not a “first mile” network, such as the Internet provider we pay for where we live, but rather a data center.

Doing this, we can block those that we consider malicious, or limit those that are in doubt, so that they do not use up too many resources and therefore unnecessarily increase our carbon footprint and the use of our resources to generate AI models.

Test “humanity”

“Testing humanity” is a resource where the server asks us to interact with them in a way that only humans could, either by demonstrating cognitive and/or motor skills, or by making our computer consume more power, demonstrating that it really wants to visit the server.

The first option is the most annoying one, as it outsources to all of us the cost of calculating who is human and who isn’t. It is also problematic because it assumes a model of humanity that has the same cognitive, motor, and even cultural and literacy capabilities, which is to say they are ultimately ableist and colonial models.

This is the model of “captcha”, whose most extractivist model is that of ReCaptcha, which takes advantage of this human energy in the training of recognition models for cars, stairs, staircases, traffic lights and so on.

The second, the one geared around using more energy, is automatic and involves doing nothing but waiting, calculating that making the device use more energy is a bit annoying for us, but very expensive for the data centers that are making millions of visits.

In this we can perhaps glimpse a short term strategy, like that of Anubis, for certain sites. We recently collaborated with Coopcloud in adding Anubis support to some of its recipes. This will allow projects like Escuela Común to better protect their servers.

Accelerate “collapse”.

The approach of the algorithmic sabotage research group (ASRG, no apparent relation to Las Ketchup) is more cataclysmic. Since we cannot avoid being visited by these crawlers, what we can do is to offensively “poison” the models they train by providing them with false or meaningless information. These traps or tarpits make it possible for us to trick crawlers by leading them to infinite visit patterns, where they will find more and more links to follow as well as meaningless articles. What would be moderately confusing, cumbersome or fun for a while for a human visitor, becomes infinite for a crawler who is not prepared for this labyrinth.

There are many explorations in this direction, ranging from strategies that require little energy on our side, such as providing them with a random selection of words, to strategies that are more involved but more difficult to detect, generating texts that are expected to be “meaningless” to humans, based on the actual content of our sites.

The accelerated collapse would be that poisoning these models causes the quality of their responses to decrease significantly, so much so that any human asking them questions will receive answers that clearly lack logic, meaning or veracity.

Our question is what a collapse would involve if we are seeing reports of psychotic breaks facilitated by conversations with ChatGPT, conversations that validate hallucinations and suicidal ideation, generation of fascist propaganda, coupled with an overall very low level of text comprehension facilitated by the extraction of attention.

And what happens when political decisions are automated and produce harmful long-term effects, regardless of the rationality of the model consulted?

What is the collapse of an AI trained to kill racialized people?

Rather, we think this acceleration assumes a global rationality based on Western academic thinking, though perhaps we are exaggerating in asking these questions. In any case, there is potential to be found in this in that it incites us to be on the offense, to return to them “the cost” of identifying what is true and what is not, to control the quality of their models–but which in turn will fall back on traumatized and exploited workers in the global south.

Zipbombs

A zipbomb is an offensive form that could be temporarily disastrous for simple crawlers, but very easy to solve if its use becomes widespread. The idea is to apply as much compression as possible to a file full of repeated data, such as zeros, which generates a very small file that when opened uses all computationally available resources (storage, RAM, processor). For example, 4.5PB of zeros can be compressed into 42KB. If 4,500,500,000,000,000,000,000 bytes can be sent in 42,000 bytes, it is very “cheap” to send over the Internet, but impossible to decompress without exhausting any computational resources. However, in order to send one of these zipbombs we have to be very sure that we are detecting a malicious agent.

The role of autonomous infrastructures

What is the role of autonomous infrastructures? How can they incorporate these strategies, while we decide among all user communities how we want to respond to AI model training? Perhaps the plurality of strategies and the diversity of implementations that we believe we have is what will defend us. Can we test all strategies, can we go unnoticed, can we surf the tsunami?

We believe that collectives and communities of users need to be involved in the conversation about how to respond to this, so that it does not remain in the technical and sullen work of the system administrator --the cisadmin–, fed up with crashing servers.

Between the capitalist model of digital monocultures and the individual model of the digital garden, we have to find the digital community garden, the cyberspace of mutual sustenance and aid.

Visibility or invisibility?

If we block or poison the models, we become invisible, we escape “the algorithm”. But if what AIs say becomes the truth, wouldn’t it be strategically in our best interest to be as visible as we can? Like the meme where Grok corrects the veracity of Musk’s statements, or discussions between alternative and community media and free communication platforms populated only be the convinced, what we don’t allow AIs to see will cease to be part of a corpus of possible answers, of the universe of possibilities we will encounter as long as we are collectively dependent on Big Tech. If what AIs answer is going to be the undisputed truth, how and from where are we going to dispute those truths?

Shouldn’t we also fill the web with links between us, sharing and deepening discussions, allowing whoever finds us to also find other information that will be veiled or censored by search engines?

For example, in the ASRG search for this article, the site we knew was inaccessible and a Wayback Machine search told us that the site was never archived, so we could not reference that experience.

What we need is a selective invisibility, albeit one that does not
work collectively as a form of gatekeeping (curiously, this reference does not exist in Spanish).