Chatbot

🌐🤖 Eh, Google Stay Training Bard With Da Web Data Scraps!

⬇️ Pidgin | ⬇️ ⬇️ English

Ho, dis Monday, Gizmodo wen spot dat Google wen change up their privacy policy. Dea policy now say dat all kine AI services from Google, like da Bard and Cloud AI, could be getting their smarts from da public data dat Google wen scrape up off da web. 🕵️‍♂️🔎

Google wen say long time ago dat their privacy policy clear dat they use da public stuffs from da web for training language models for things like Google Translate. Dat Google spokesperson, Christa Muldoon, wen tell The Verge all about it. She say, dis last update just let us know dat new kine services like Bard also stay included. She say, they put privacy principles and safeguards in da making of their AI tech, in line with Google’s AI Principles. 👩‍💻🔒

From July 1st, 2023, Google’s privacy policy been saying that “Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public” and that the company may “use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.” 📚🚀

If you take one look at da policy’s revision history, you can see dat the update give little bit more clearness as to which services gonna be trained with da data dey collect. Da document now say dat the information could be used for “AI Models” instead of just “language models,” giving Google more freedom for training and building systems next to LLMs with your public data. But dat part stay hidden under one link for “publically accessible sources” under the policy’s “Your Local Information” tab dat you gotta click for open the right section. 📄🔗

Da new policy say dat “publicly available information” is used for training Google’s AI products but no say how (or if) the company gonna stop copyrighted stuffs from being included in that data pool. Plenty public websites get rules dat no allow data collection or web scraping for training big language models and other AI tools. Gonna be interesting for see how dis approach work out with all da global rules like GDPR dat protect people from their data being used all any kine without their say so. 🌍📜

Da laws and more competition in da market have made the makers of popular AI systems like OpenAI’s GPT-4 real hush-hush about where they got da data used for training and if they include things like social media posts or copyrighted works by human artists and authors. 🎨📚

Da question about if the fair use doctrine go for this kind stuff still sit in a legal gray area. Da unsureness wen spark all kine lawsuits and push lawmakers in some countries for make more strict laws dat can better control how AI companies collect and use their training data. It also raise questions about how this data being processed for make sure it no lead to dangerous mess ups in AI systems, with da people who gotta sort through these big data pools often having to work long hours and under tough conditions. ⚖️👩‍⚖️

Gannett, da biggest newspaper publisher in the United States, is suing Google and its parent company, Alphabet, claiming that advancements in AI technology have helped the search giant to hold a monopoly over the digital ad market. Products like Google’s AI search beta have also been called “plagiarism engines” and criticized for starving websites of traffic. 📰💼

Meanwhile, Twitter and Reddit — two social platforms that contain big amounts of public information — have recently taken drastic measures to try and prevent other companies from freely harvesting their data. The API changes and limitations placed on the platforms have been met with backlash by their respective communities, as anti-scraping changes have negatively affected the core Twitter and Reddit user experiences. 🐦💬

So, no matter what, looks like we all gotta be more aware about how our data being used, yeah? Keep your eyes open and stay informed, cuz we all part of da digital world, whether we like it or not! 🌺🤙


NOW IN ENGLISH

🌐🤖 Google Confirms it’s Also Training Bard on Scraped Web Data!

This Monday, Gizmodo noticed that Google updated its privacy policy. The policy now states that Google’s artificial intelligence (AI) services, including Bard and Cloud AI, may be trained using public data that the tech giant has scraped from the internet. 🕵️‍♂️🔎

Google has long stated that its privacy policy clearly indicates that it uses publicly available information from the internet to train language models for services like Google Translate. Google spokesperson Christa Muldoon confirmed this to The Verge, explaining that the latest update simply clarifies that newer services, such as Bard, are also included. She added that Google integrates privacy principles and safeguards when developing its AI technologies, consistent with Google’s AI Principles. 👩‍💻🔒

Since July 1st, 2023, Google’s privacy policy states that “Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public”. The company may also “use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.” 📚🚀

The policy’s revision history reveals that the update provides further clarity regarding the services that will be trained using the collected data. The document now refers to the information as potentially used for “AI Models” rather than just “language models.” This gives Google more freedom to train and build systems alongside LLMs using your public data. However, this note is hidden under a link titled “publically accessible sources” under the policy’s “Your Local Information” tab that users must click to open the relevant section. 📄🔗

The updated policy mentions that “publicly available information” is used to train Google’s AI products but doesn’t clarify how (or if) the company will prevent copyrighted materials from being included in the data pool. Many publicly accessible websites have policies prohibiting data collection or web scraping for the purpose of training large language models and other AI tools. It will be intriguing to see how this approach interacts with global regulations like the GDPR that safeguard people from their data being used without their express permission. 🌍📜

The combination of these laws and increased market competition has led makers of popular AI systems like OpenAI’s GPT-4 to be secretive about their data sources, particularly whether or not they include social media posts or copyrighted works by human artists and authors. 🎨📚

The issue of whether the fair use doctrine applies to such activities remains a legal gray area. This uncertainty has spurred numerous lawsuits and driven lawmakers in some countries to introduce stricter laws better suited to regulate how AI companies collect and use their training data. It also raises concerns about data processing to ensure it doesn’t result in harmful failures within AI systems, with those tasked with sorting through these large pools of training data often subjected to long hours and harsh working conditions. ⚖️👩‍⚖️

Gannett, the largest newspaper publisher in the United States, is suing Google and its parent company, Alphabet. They claim that advancements in AI technology have helped the search giant maintain a monopoly over the digital ad market. Products like Google’s AI search beta have also been labeled as “plagiarism engines” and criticized for causing a decline in website traffic. 📰💼

In the meantime, Twitter and Reddit — two social platforms with vast amounts of public information — have recently taken severe measures to prevent other companies from freely harvesting their data. The API changes and restrictions imposed on the platforms have met with backlash from their respective communities, as anti-scraping changes have negatively affected the core user experiences of Twitter and Reddit. 🐦💬

So, it’s clear that we all need to be more aware of how our data is being used, right? Stay vigilant and informed, because we are all part of the digital world, whether we like it or not! 🌺🤙

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *