Tumgik
#keywording
prokopetz · 6 months
Text
Shout-out to TCGs with such rigorous keywording that even the word "card" has a specific definition and each card is obliged to explicitly tell you whether or not the card you're holding in your hand is a card.
3K notes · View notes
Text
python keyword extraction using nltk wordnet
import re # include wordnet.morphy from nltk.corpus import wordnet # https://pythonprogrammingsnippets.tumblr.com/ def get_non_plural(word): # return the non-plural form of a word # if word is not empty if word != "": # get the non-plural form non_plural = wordnet.morphy(word, wordnet.NOUN) # if non_plural is not empty if non_plural != None: # return the non-plural form # print(word, "->", non_plural) return non_plural # if word is empty or non_plural is empty return word def get_root_word(word): # return the root word of a word # if word is not empty if word != "": word = get_non_plural(word) # get the root word root_word = wordnet.morphy(word) # if root_word is not empty if root_word != None: # return the root word # print(word, "->", root_word) word = root_word # if word is empty or root_word is empty return word def process_keywords(keywords): ret_k = [] for k in keywords: # replace all characters that are not letters, spaces, or apostrophes with a space k = re.sub(r"[^a-zA-Z' ]", " ", k) # if there is more than one whitespace in a row, replace it # with a single whitespace k = re.sub(r"\s+", " ", k) # remove leading and trailing whitespace k = k.strip() k = k.lower() # if k has more than one word, split it into words and add each word # back to keywords if " " in k: ret_k.append(k) # we still want the original keyword k = k.split(" ") for k2 in k: #if not is_adjective(k2): ret_k.append(get_root_word(k2)) ret_k.append(k2.strip()) else: # if not is_adjective(k): ret_k.append(get_root_word(k)) ret_k.append(k.strip()) # unique ret_k = list(set(ret_k)) # remove empty strings ret_k = [k for k in ret_k if k != ""] # remove all words that are less than 3 characters ret_k = [k for k in ret_k if len(k) >= 3] # remove words like 'and', 'or', 'the', etc. ret_k = [k for k in ret_k if k not in ["and", "or", "the", "a", "an", "of", "to", "in", "on", "at", "for", "with", "from", "by", "as", "into", "like", "through", "after", "over", "between", "out", "against", "during", "without", "before", "under", "around", "among", "throughout", "despite", "towards", "upon", "concerning", "of", "to", "in", "on", "at", "for", "with", "from", "by", "as", "into", "like", "through", "after", "over", "between", "out", "against", "during", "without", "before", "under", "around", "among", "throughout", "despite", "towards", "upon", "concerning", "this", "that", "these", "those", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "will", "would", "shall", "should", "can", "could", "may", "might", "must", "ought", "i", "me", "my", "mine", "we", "us", "our", "ours", "you", "your", "yours", "he", "him", "his", "she", "her", "hers", "it", "its", "they", "them", "their", "theirs", "what", "which", "who", "whom", "whose", "this", "that", "these", "those", "myself", "yourself", "himself", "herself", "itself", "ourselves", "yourselves", "themselves", "whoever", "whatever", "whomever", "whichever", "whichever" ]] return ret_k def extract_keywords(paragraph): if " " in paragraph: return paragraph.split(" ") return [paragraph]
example usage:
the_string = "Jims House of Judo and Karate is a martial arts school in the heart of downtown San Francisco. We offer classes in Judo, Karate, and Jiu Jitsu. We also offer private lessons and group classes. We have a great staff of instructors who are all black belts. We have been in business for over 20 years. We are located at 123 Main Street." keywords = process_keywords(extract_keywords(the_string)) print(keywords)
output:
# output: ['jims', 'instructors', 'class', 'lesson', 'all', 'school', 'san', 'martial', 'classes', 'karate', 'great', 'lessons', 'downtown', 'private', 'arts', 'also', 'locate', 'belts', 'business', 'judo', 'years', 'located', 'main', 'street', 'jitsu', 'house', 'offer', 'staff', 'group', 'heart', 'instructor', 'belt', 'black', 'francisco', 'jiu']
1 note · View note
Text
got a worm nibbling my brain. can someone help me find a piece of obscure media?
webcomic/indie comic from the 2010s. basically a sci-fi short story about a young girl (with red hair?) who was being raised by scientists as part of an experiment. she receives a haircut/has her head shaved, in preparation for her annual brain scan/testing. it is revealed that while her body is human, her "brain" is artificial, made of computer implants throughout her skull and spine. at some point her biological mother (also a scientist on the same campus?) encounters her and is repulsed, viewing her as a machine who has murdered her daughter.
it was very poignant and it bruised my heart and i can NOT find it anywhere
3K notes · View notes
Text
Google is (still) losing the spam wars to zombie news-brands
Tumblr media
I'm touring my new, nationally bestselling novel The Bezzle! Catch me TONIGHT (May 3) in CALGARY, then TOMORROW (May 4) in VANCOUVER, then onto Tartu, Estonia, and beyond!
Tumblr media
Even Google admits – grudgingly – that it is losing the spam wars. The explosive proliferation of botshit has supercharged the sleazy "search engine optimization" business, such that results to common queries are 50% Google ads to spam sites, and 50% links to spam sites that tricked Google into a high rank (without paying for an ad):
https://developers.google.com/search/blog/2024/03/core-update-spam-policies#site-reputation
It's nice that Google has finally stopped gaslighting the rest of us with claims that its search was still the same bedrock utility that so many of us relied upon as a key piece of internet infrastructure. This not only feels wildly wrong, it is empirically, provably false:
https://downloads.webis.de/publications/papers/bevendorff_2024a.pdf
Not only that, but we know why Google search sucks. Memos released as part of the DOJ's antitrust case against Google reveal that the company deliberately chose to worsen search quality to increase the number of queries you'd have to make (and the number of ads you'd have to see) to find a decent result:
https://pluralistic.net/2024/04/24/naming-names/#prabhakar-raghavan
Google's antitrust case turns on the idea that the company bought its way to dominance, spending the some of the billions it extracted from advertisers and publishers to buy the default position on every platform, so that no one ever tried another search engine, which meant that no one would invest in another search engine, either.
Google's tacit defense is that its monopoly billions only incidentally fund these kind of anticompetitive deals. Mostly, Google says, it uses its billions to build the greatest search engine, ad platform, mobile OS, etc that the public could dream of. Only a company as big as Google (says Google) can afford to fund the R&D and security to keep its platform useful for the rest of us.
That's the "monopolistic bargain" – let the monopolist become a dictator, and they will be a benevolent dictator. Shriven of "wasteful competition," the monopolist can split their profits with the public by funding public goods and the public interest.
Google has clearly reneged on that bargain. A company experiencing the dramatic security failures and declining quality should be pouring everything it has to righting the ship. Instead, Google repeatedly blew tens of billions of dollars on stock buybacks while doing mass layoffs:
https://pluralistic.net/2024/02/21/im-feeling-unlucky/#not-up-to-the-task
Those layoffs have now reached the company's "core" teams, even as its core services continue to decay:
https://qz.com/google-is-laying-off-hundreds-as-it-moves-core-jobs-abr-1851449528
(Google's antitrust trial was shrouded in secrecy, thanks to the judge's deference to the company's insistence on confidentiality. The case is moving along though, and warrants your continued attention:)
https://www.thebignewsletter.com/p/the-2-trillion-secret-trial-against
Google wormed its way into so many corners of our lives that its enshittification keeps erupting in odd places, like ordering takeout food:
https://pluralistic.net/2023/02/24/passive-income/#swiss-cheese-security
Back in February, Housefresh – a rigorous review site for home air purifiers – published a viral, damning account of how Google had allowed itself to be overrun by spammers who purport to provide reviews of air purifiers, but who do little to no testing and often employ AI chatbots to write automated garbage:
https://housefresh.com/david-vs-digital-goliaths/
In the months since, Housefresh's Gisele Navarro has continued to fight for the survival of her high-quality air purifier review site, and has received many tips from insiders at the spam-farms and Google, all of which she recounts in a followup essay:
https://housefresh.com/how-google-decimated-housefresh/
One of the worst offenders in spam wars is Dotdash Meredith, a content-farm that "publishes" multiple websites that recycle parts of each others' content in order to climb to the top search slots for lucrative product review spots, which can be monetized via affiliate links.
A Dotdash Meredith insider told Navarro that the company uses a tactic called "keyword swarming" to push high-quality independent sites off the top of Google and replace them with its own garbage reviews. When Dotdash Meredith finds an independent site that occupies the top results for a lucrative Google result, they "swarm a smaller site’s foothold on one or two articles by essentially publishing 10 articles [on the topic] and beefing up [Dotdash Meredith sites’] authority."
Dotdash Meredith has keyword swarmed a large number of topics. from air purifiers to slow cookers to posture correctors for back-pain:
https://housefresh.com/wp-content/uploads/2024/05/keyword-swarming-dotdash.jpg
The company isn't shy about this. Its own shareholder communications boast about it. What's more, it has competition.
Take Forbes, an actual news-site, which has a whole shadow-empire of web-pages reviewing products for puppies, dogs, kittens and cats, all of which link to high affiliate-fee-generating pet insurance products. These reviews are not good, but they are treasured by Google's algorithm, which views them as a part of Forbes's legitimate news-publishing operation and lets them draft on Forbes's authority.
This side-hustle for Forbes comes at a cost for the rest of us, though. The reviewers who actually put in the hard work to figure out which pet products are worth your money (and which ones are bad, defective or dangerous) are crowded off the front page of Google and eventually disappear, leaving behind nothing but semi-automated SEO garbage from Forbes:
https://twitter.com/ichbinGisele/status/1642481590524583936
There's a name for this: "site reputation abuse." That's when a site perverts its current – or past – practice of publishing high-quality materials to trick Google into giving the site a high ranking. Think of how Deadspin's private equity grifter owners turned it into a site full of casino affiliate spam:
https://www.404media.co/who-owns-deadspin-now-lineup-publishing/
The same thing happened to the venerable Money magazine:
https://moneygroup.pr/
Money is one of the many sites whose air purifier reviews Google gives preference to, despite the fact that they do no testing. According to Google, Money is also a reliable source of information on reprogramming your garage-door opener, buying a paint-sprayer, etc:
https://money.com/best-paint-sprayer/
All of this is made ten million times worse by AI, which can spray out superficially plausible botshit in superhuman quantities, letting spammers produce thousands of variations on their shitty reviews, flooding the zone with bullshit in classic Steve Bannon style:
https://escapecollective.com/commerce-content-is-breaking-product-reviews/
As Gizmodo, Sports Illustrated and USA Today have learned the hard way, AI can't write factual news pieces. But it can pump out bullshit written for the express purpose of drafting on the good work human journalists have done and tricking Google – the search engine 90% of us rely on – into upranking bullshit at the expense of high-quality information.
A variety of AI service bureaux have popped up to provide AI botshit as a service to news brands. While Navarro doesn't say so, I'm willing to bet that for news bosses, outsourcing your botshit scams to a third party is considered an excellent way of avoiding your journalists' wrath. The biggest botshit-as-a-service company is ASR Group (which also uses the alias Advon Commerce).
Advon claims that its botshit is, in fact, written by humans. But Advon's employees' Linkedin profiles tell a different story, boasting of their mastery of AI tools in the industrial-scale production of botshit:
https://housefresh.com/wp-content/uploads/2024/05/Advon-AI-LinkedIn.jpg
Now, none of this is particularly sophisticated. It doesn't take much discernment to spot when a site is engaged in "site reputation abuse." Presumably, the 12,000 googlers the company fired last year could have been employed to check the top review keyword results manually every couple of days and permaban any site caught cheating this way.
Instead, Google is has announced a change in policy: starting May 5, the company will downrank any site caught engaged in site reputation abuse. However, the company takes a very narrow view of site reputation abuse, limiting punishments to sites that employ third parties to generate or uprank their botshit. Companies that produce their botshit in-house are seemingly not covered by this policy.
As Navarro writes, some sites – like Forbes – have prepared for May 5 by blocking their botshit sections from Google's crawler. This can't be their permanent strategy, though – either they'll have to kill the section or bring it in-house to comply with Google's rules. Bringing things in house isn't that hard: US News and World Report is advertising for an SEO editor who will publish 70-80 posts per month, doubtless each one a masterpiece of high-quality, carefully researched material of great value to Google's users:
https://twitter.com/dannyashton/status/1777408051357585425
As Navarro points out, Google is palpably reluctant to target the largest, best-funded spammers. Its March 2024 update kicked many garbage AI sites out of the index – but only small bottom-feeders, not large, once-respected publications that have been colonized by private equity spam-farmers.
All of this comes at a price, and it's only incidentally paid by legitimate sites like Housefresh. The real price is borne by all of us, who are funneled by the 90%-market-share search engine into "review" sites that push low quality, high-price products. Housefresh's top budget air purifier costs $79. That's hundreds of dollars cheaper than the "budget" pick at other sites, who largely perform no original research.
Google search has a problem. AI botshit is dominating Google's search results, and it's not just in product reviews. Searches for infrastructure code samples are dominated by botshit code generated by Pulumi AI, whose chatbot hallucinates nonexistence AWS features:
https://www.theregister.com/2024/05/01/pulumi_ai_pollution_of_search/
This is hugely consequential: when these "hallucinations" slip through into production code, they create huge vulnerabilities for widespread malicious exploitation:
https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/
We've put all our eggs in Google's basket, and Google's dropped the basket – but it doesn't matter because they can spend $20b/year bribing Apple to make sure no one ever tries a rival search engine on Ios or Safari:
https://finance.yahoo.com/news/google-payments-apple-reached-20-220947331.html
Google's response – laying off core developers, outsourcing to low-waged territories with weak labor protections and spending billions on stock buybacks – presents a picture of a company that is too big to care:
https://pluralistic.net/2024/04/04/teach-me-how-to-shruggie/#kagi
Google promised us a quid-pro-quo: let them be the single, authoritative portal ("organize the world’s information and make it universally accessible and useful"), and they will earn that spot by being the best search there is:
https://www.ft.com/content/b9eb3180-2a6e-41eb-91fe-2ab5942d4150
But – like the spammers at the top of its search result pages – Google didn't earn its spot at the center of our digital lives.
It cheated.
Tumblr media
If you'd like an essay-formatted version of this post to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
https://pluralistic.net/2024/05/03/keyword-swarming/#site-reputation-abuse
Tumblr media
Image: freezelight (modified) https://commons.wikimedia.org/wiki/File:Spam_wall_-_Flickr_-_freezelight.jpg
CC BY-SA 2.0 https://creativecommons.org/licenses/by-sa/2.0/deed.en
842 notes · View notes
jimimn · 7 months
Text
Tumblr media Tumblr media Tumblr media Tumblr media
Q. What recent advice gave you a lot of strength?
1K notes · View notes
time-is-restored · 6 months
Text
i know there's virtually no media coverage for it, but australia did turn out yesterday for palestine across our capital cities. brisbane adelaide melbourne and sydney all came out, and several of the crowds were the biggest they've been since the protests started. we're still here, and we still care, even if murdoch wants you to think we're silent
Tumblr media Tumblr media Tumblr media
2K notes · View notes
Text
atla modern au where suki & zuko are life guards for the summer and sokka just keeps drowning
936 notes · View notes
chanrizard · 3 months
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
CHRIS BANG #4 🐺 221128
1K notes · View notes
fresh-bed-old-sheets · 4 months
Text
Okay i'll go along with this trend because i need outer motivation
If this post gets 700 notes i WILL pressure myself to limit my screen time and i WILL TRY
765 notes · View notes
kanrix · 4 months
Text
It's all about "respecting characters' canon (queer) sexualities!!" until it's a repulsed Aroace
588 notes · View notes
duchi-nesten · 1 year
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
🪦 DannyMay Day 18: Grave 🪦
Corpse AU anyone? 👀 Pretty sure this is my first time trying to do something angsty, did I do good? :)
3K notes · View notes
wolfythewitch · 1 year
Text
Tumblr media
sketchin
2K notes · View notes
littlecrittereli · 6 days
Text
Wild Kratts beach episodes my beloved
Tumblr media
202 notes · View notes
dicennio · 1 year
Photo
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
BLOOD FIEND - POWER
2K notes · View notes
isbergillustration · 2 months
Text
Tumblr media
A very fun commission I got to do for @phynoma
320 notes · View notes
deathflare · 2 months
Text
Tumblr media
remembered another instance of erenville smiling genuinely at wuk lamat (when she was just LEGS) clutching my heart… manifesting siblings…..
244 notes · View notes