Channel: Talks with ChatGPT
Forwarded from AI Revolution
CopilotChat
CopilotChat is an AI-powered tool designed to facilitate code generation through Test-Driven Development. The tool primarily operates in three steps. The first step includes defining test cases, where developers provide inputs, expected outputs, and an optional requirement description.
This feature allows for detailed planning of test-driven development processes and enables the setting of specific expectations for the code's performance.
The second step involves code generation. The LLM component of the tool is responsible for this, creating code based on the previously defined test cases and requirement descriptions.
This AI integration assists in accelerating the code development process while maintaining quality and efficiency. The third step consists of validation, where CopilotChat cross-verifies the generated code against the set test cases.
š°Price: Free
š Link
CopilotChat is an AI-powered tool designed to facilitate code generation through Test-Driven Development. The tool primarily operates in three steps. The first step includes defining test cases, where developers provide inputs, expected outputs, and an optional requirement description.
This feature allows for detailed planning of test-driven development processes and enables the setting of specific expectations for the code's performance.
The second step involves code generation. The LLM component of the tool is responsible for this, creating code based on the previously defined test cases and requirement descriptions.
This AI integration assists in accelerating the code development process while maintaining quality and efficiency. The third step consists of validation, where CopilotChat cross-verifies the generated code against the set test cases.
š°Price: Free
š Link
Evaluating Machine Learning Agents on Machine Learning Engineering
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Kaggle's publicly available leaderboards. We use open-source agent scaffolds to evaluate several frontier language models on our benchmark, finding that the best-performing setup ā OpenAI's o1-preview with AIDE scaffolding ā achieves at least the level of a Kaggle bronze medal in 16.9% of competitions. In addition to our main results, we investigate various forms of resource-scaling for AI agents and the impact of contamination from pre-training.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Kaggle's publicly available leaderboards. We use open-source agent scaffolds to evaluate several frontier language models on our benchmark, finding that the best-performing setup ā OpenAI's o1-preview with AIDE scaffolding ā achieves at least the level of a Kaggle bronze medal in 16.9% of competitions. In addition to our main results, we investigate various forms of resource-scaling for AI agents and the impact of contamination from pre-training.
š1
Evaluating fairness in ChatGPT
We've analyzed how ChatGPT responds to users based on their name, using language model research assistants to protect privacy.
Creating our models takes more than dataāwe also carefully design the training process to reduce harmful outputs and improve usefulness. Research has shown that language models can still sometimes absorb and repeat social biases from training data, such as gender or racial stereotypes.
In this study, we explored how subtle cues about a user's identityālike their nameācan influence ChatGPT's responses. This matters because people use chatbots like ChatGPT in a variety of ways, from helping them draft a resume to asking for entertainment tips, which differ from the scenarios typically studied in AI fairness research.
While previous research has focused on third-person fairness, where institutions use AI to make decisions about others, this study examines first-person fairness, or how biases affect users directly in ChatGPT.
We've analyzed how ChatGPT responds to users based on their name, using language model research assistants to protect privacy.
Creating our models takes more than dataāwe also carefully design the training process to reduce harmful outputs and improve usefulness. Research has shown that language models can still sometimes absorb and repeat social biases from training data, such as gender or racial stereotypes.
In this study, we explored how subtle cues about a user's identityālike their nameācan influence ChatGPT's responses. This matters because people use chatbots like ChatGPT in a variety of ways, from helping them draft a resume to asking for entertainment tips, which differ from the scenarios typically studied in AI fairness research.
While previous research has focused on third-person fairness, where institutions use AI to make decisions about others, this study examines first-person fairness, or how biases affect users directly in ChatGPT.
š1
ChatGPT has a Windows app now
OpenAI is testing a ChatGPT app for Windowsā but itās only available to paid users for now. You can download an early version of the app from then Microsoft Store. Just like the Mac version of the app. ChatGPT on Windows lets you ask the AI-powered chatbot questions in a dedicated window that you can keep open alongside your apps. You can quickly access the app by using the Alt + Space shortcut.
It also lets you upload files and photos to ChatGPT and comes with access to a preview of OpenAIās o1 model capable of āreasoning.ā The app is still missing some capabilities, however, such as advanced voice mode. Shortly after OpenAI launched its ChatGPT app on Mac in June, a developer spotted a security, vulnerability that stored conversations in plain text. OpenAI has since fixed this issue and now encrypts locally stored data. Even though only ChatGPT Plus, Enterprise, Team, and Edu subscribers can use app on Windows, OpenAI says it plans on bringing it to everyone later this year.
OpenAI is testing a ChatGPT app for Windowsā but itās only available to paid users for now. You can download an early version of the app from then Microsoft Store. Just like the Mac version of the app. ChatGPT on Windows lets you ask the AI-powered chatbot questions in a dedicated window that you can keep open alongside your apps. You can quickly access the app by using the Alt + Space shortcut.
It also lets you upload files and photos to ChatGPT and comes with access to a preview of OpenAIās o1 model capable of āreasoning.ā The app is still missing some capabilities, however, such as advanced voice mode. Shortly after OpenAI launched its ChatGPT app on Mac in June, a developer spotted a security, vulnerability that stored conversations in plain text. OpenAI has since fixed this issue and now encrypts locally stored data. Even though only ChatGPT Plus, Enterprise, Team, and Edu subscribers can use app on Windows, OpenAI says it plans on bringing it to everyone later this year.
š1
Summary of OpenAI's GPT model versions:
GPT-1 (2018): First Transformer-based model, small-scale proof of concept.
GPT-2 (2019): 1.5 billion parameters, showcased strong text generation; initially withheld due to safety concerns.
GPT-3 (2020): 175 billion parameters, popularized few-shot learning across diverse tasks.
GPT-4 (2023): Multimodal (text and images); includes faster, cost-efficient versions like GPT-4-turbo, GPT-4o , GPT-4o mini and models for complext reasoning like o1-preview and o1-mini
Future - GPT-5: In development; expected to expand multimodal capabilities and increase efficiency, though details remain speculative.
GPT-1 (2018): First Transformer-based model, small-scale proof of concept.
GPT-2 (2019): 1.5 billion parameters, showcased strong text generation; initially withheld due to safety concerns.
GPT-3 (2020): 175 billion parameters, popularized few-shot learning across diverse tasks.
GPT-4 (2023): Multimodal (text and images); includes faster, cost-efficient versions like GPT-4-turbo, GPT-4o , GPT-4o mini and models for complext reasoning like o1-preview and o1-mini
Future - GPT-5: In development; expected to expand multimodal capabilities and increase efficiency, though details remain speculative.
š1
Google's AlphaFold 3: A Game Changer for Medicine
Proteins are the workhorses of our cells, but figuring out their 3D shapes - crucial for understanding how they function has been a scientific nightmare. Enter AlphaFold 3, a groundbreaking Al tool by Google DeepMind that just took a massive leap forward.
This isn't just about bragging rights for DeepMind; AlphaFold 3 has the potential to revolutionize medicine. Demis Hassabis, the CEO of both DeepMind and its drug discovery spin-off Isomorphic Labs, believes it could become a multi-hundred billion dollar industry while also saving countless lives.
Think of it this way: Proteins are like tiny molecular machines with specific jobs. But to design drugs that interact with them effectively, we need to know their exact shape.
Proteins are the workhorses of our cells, but figuring out their 3D shapes - crucial for understanding how they function has been a scientific nightmare. Enter AlphaFold 3, a groundbreaking Al tool by Google DeepMind that just took a massive leap forward.
This isn't just about bragging rights for DeepMind; AlphaFold 3 has the potential to revolutionize medicine. Demis Hassabis, the CEO of both DeepMind and its drug discovery spin-off Isomorphic Labs, believes it could become a multi-hundred billion dollar industry while also saving countless lives.
Think of it this way: Proteins are like tiny molecular machines with specific jobs. But to design drugs that interact with them effectively, we need to know their exact shape.
š2
Gmail will now help you write an email on the web with AI
Google is also adding a shortcut to help you quickly refine drafts on mobile and the web.
Google is expanding āHelp me writeā to Gmail on the web, allowing users to whip up or tweak emails using Gemini AI. Just like on mobile, users will see a prompt to use the feature when opening a blank draft in Gmail.
Googleās āHelp me writeā feature is only available to users who subscribe to Google One AI Premium or have the Gemini add-on for Workspace. In addition to generating an email draft, āHelp me writeā can also provide suggestions on how to formalize, elaborate, or shorten a message.
Google is also adding a shortcut for the āpolishā option available within its āHelp me writeā toolset, which will appear on drafts with over 12 words. For Gmail on the web, users can click the shortcut or type Ctrl + H to quickly refine an email.
On mobile, the option will replace the existing āRefine my draftā shortcut. Instead of swiping to see options to polish.
Google is also adding a shortcut to help you quickly refine drafts on mobile and the web.
Google is expanding āHelp me writeā to Gmail on the web, allowing users to whip up or tweak emails using Gemini AI. Just like on mobile, users will see a prompt to use the feature when opening a blank draft in Gmail.
Googleās āHelp me writeā feature is only available to users who subscribe to Google One AI Premium or have the Gemini add-on for Workspace. In addition to generating an email draft, āHelp me writeā can also provide suggestions on how to formalize, elaborate, or shorten a message.
Google is also adding a shortcut for the āpolishā option available within its āHelp me writeā toolset, which will appear on drafts with over 12 words. For Gmail on the web, users can click the shortcut or type Ctrl + H to quickly refine an email.
On mobile, the option will replace the existing āRefine my draftā shortcut. Instead of swiping to see options to polish.
š2
AI has achieved 'scent teleportation
Photos, videos, and audio have been digital for decades, and now smell is joining the digitization.
Here's how Osmo, a 2-year-old American startup, made it happen:
> AI analyzed, digitized, and reproduced the smell of plum, creating an exact replica of its smell without human intervention
> The process combines gas chromatography, mass spectrometry, and AI-driven analysis to create a digital āscent fingerprint'
> Osmo's proprietary AI system uses the world's largest scent database to map and recreate molecular compositions
> The company successfully demonstrated full scent digitization using plum as a test subject
> The company is planning public demos of the tech and is considering releasing a limited-edition fragrance of their first teleported scent
Imagine sharing a video that not only shares the moment visually and audibly but also shares the exact smell captured at the moment
Immersive experiences are going to be insane in the future with VR x AR x AI
Photos, videos, and audio have been digital for decades, and now smell is joining the digitization.
Here's how Osmo, a 2-year-old American startup, made it happen:
> AI analyzed, digitized, and reproduced the smell of plum, creating an exact replica of its smell without human intervention
> The process combines gas chromatography, mass spectrometry, and AI-driven analysis to create a digital āscent fingerprint'
> Osmo's proprietary AI system uses the world's largest scent database to map and recreate molecular compositions
> The company successfully demonstrated full scent digitization using plum as a test subject
> The company is planning public demos of the tech and is considering releasing a limited-edition fragrance of their first teleported scent
Imagine sharing a video that not only shares the moment visually and audibly but also shares the exact smell captured at the moment
Immersive experiences are going to be insane in the future with VR x AR x AI
š1
Introducing ChatGPT search
ChatGPT can now search the web in a much better way than before. You can get fast, timely answers with links to relevant web sources, which you would have previously needed to go to a search engine for. This blends the benefits of a natural language interface with the value of up-to-date sports scores, news, stock quotes, and more.
ChatGPT will choose to search the web based on what you ask, or you can manually choose to search by clicking the web search icon.
On mobile, the option will replace the existing āRefine my draftā shortcut. Instead of swiping to see options to polish.
Search will be available at chatgpt.comā (opens in a new window), as well as on our desktop and mobile apps. All ChatGPT Plus and Team users, as well as SearchGPT waitlist users, will have access today. Enterprise and Edu users will get access in the next few weeks. Weāll roll out to all Free users over the coming months.
ChatGPT can now search the web in a much better way than before. You can get fast, timely answers with links to relevant web sources, which you would have previously needed to go to a search engine for. This blends the benefits of a natural language interface with the value of up-to-date sports scores, news, stock quotes, and more.
ChatGPT will choose to search the web based on what you ask, or you can manually choose to search by clicking the web search icon.
On mobile, the option will replace the existing āRefine my draftā shortcut. Instead of swiping to see options to polish.
Search will be available at chatgpt.comā (opens in a new window), as well as on our desktop and mobile apps. All ChatGPT Plus and Team users, as well as SearchGPT waitlist users, will have access today. Enterprise and Edu users will get access in the next few weeks. Weāll roll out to all Free users over the coming months.
š1
Promega's top-down adoption of ChatGPT accelerates manufacturing, sales, and marketing
Promegaā (opens in a new window) is an established leader in life sciences, providing pioneering biological reagents and integrated systems used in research and applied technology. Their products, used by companies worldwide, have led to significant advancements in areas of therapeutic discovery, clinical research, and forensics.
Promegaās widespread integration of ChatGPT helps them deliver quality products faster to the biotech ecosystemāan effort that started from the top. āInnovation is something that is our lifeblood, and our future is based on developing solutions from the talent and ideas that our people have,ā said Bill Linton, CEO of Promega. āAI works perfectly with this vision, helping people to see more of what they can do.ā
Promegaā (opens in a new window) is an established leader in life sciences, providing pioneering biological reagents and integrated systems used in research and applied technology. Their products, used by companies worldwide, have led to significant advancements in areas of therapeutic discovery, clinical research, and forensics.
Promegaās widespread integration of ChatGPT helps them deliver quality products faster to the biotech ecosystemāan effort that started from the top. āInnovation is something that is our lifeblood, and our future is based on developing solutions from the talent and ideas that our people have,ā said Bill Linton, CEO of Promega. āAI works perfectly with this vision, helping people to see more of what they can do.ā
Why open-source AI models are good for the world
Open innovation lies at the heart of the artificial-intelligence (ai) boom. The neural network ātransformerāāthe t in GPTāthat underpins OpenAIās was first published as research by engineers at Google. TensorFlow and PyTorch, used to build those neural networks, were created by Google and Meta, respectively, and shared with the world. Today, some argue that AI is too important and sensitive to be available to everyone, everywhere. Models that are āopen-sourceāāie, that make underlying code available to all, to remix and reuse as they pleaseāare often seen as dangerous.
Open innovation lies at the heart of the artificial-intelligence (ai) boom. The neural network ātransformerāāthe t in GPTāthat underpins OpenAIās was first published as research by engineers at Google. TensorFlow and PyTorch, used to build those neural networks, were created by Google and Meta, respectively, and shared with the world. Today, some argue that AI is too important and sensitive to be available to everyone, everywhere. Models that are āopen-sourceāāie, that make underlying code available to all, to remix and reuse as they pleaseāare often seen as dangerous.
š2
Introducing SimpleQA
Factuality is a complicated topic because it is hard to measureāevaluating the factuality of any given arbitrary claim is challenging, and language models can generate long completions that contain dozens of factual claims. In SimpleQA, we will focus on short, fact-seeking queries, which reduces the scope of the benchmark but makes measuring factuality much more tractable.
As a final verification of quality, we had a third AI trainer answer a random sample of 1,000 questions from the dataset. We found that the third AI trainerās answer matched the original agreed answers 94.4% of the time, with a 5.6% disagreement rate. We then manually inspected these examples, and found that 2.8% of the 5.6% of disagreements were due to grader false negatives or human errors from the third trainer (e.g., incomplete answers or misinterpreting sources), and the remaining 2.8% were due to real issues with the question (e.g., ambiguous questions, or different websites giving conflicting answers).
Factuality is a complicated topic because it is hard to measureāevaluating the factuality of any given arbitrary claim is challenging, and language models can generate long completions that contain dozens of factual claims. In SimpleQA, we will focus on short, fact-seeking queries, which reduces the scope of the benchmark but makes measuring factuality much more tractable.
As a final verification of quality, we had a third AI trainer answer a random sample of 1,000 questions from the dataset. We found that the third AI trainerās answer matched the original agreed answers 94.4% of the time, with a 5.6% disagreement rate. We then manually inspected these examples, and found that 2.8% of the 5.6% of disagreements were due to grader false negatives or human errors from the third trainer (e.g., incomplete answers or misinterpreting sources), and the remaining 2.8% were due to real issues with the question (e.g., ambiguous questions, or different websites giving conflicting answers).
š2
Oil bosses have big hopes for the AI boom
Data centres are fuelling demand for natural gasāfor now
This week 180,000 people descended on Abu Dhabi to attend ADIPEC, the global oil-and-gas industryās biggest annual gathering. This yearās focus, perhaps unsurprisingly, was the nexus of artificial intelligence (AI) and energy. On the eve of the jamboree Sultan Al Jaber, chief executive of ADNOC, the Emirati national oil giant, convened a private meeting of big tech and big energy bosses. A survey of some 400 energy, tech and finance bigwigs released in conjunction with the event concluded that AI is set to transform the energy business by boosting efficiency and cutting greenhouse-gas emissions.
Data centres are fuelling demand for natural gasāfor now
This week 180,000 people descended on Abu Dhabi to attend ADIPEC, the global oil-and-gas industryās biggest annual gathering. This yearās focus, perhaps unsurprisingly, was the nexus of artificial intelligence (AI) and energy. On the eve of the jamboree Sultan Al Jaber, chief executive of ADNOC, the Emirati national oil giant, convened a private meeting of big tech and big energy bosses. A survey of some 400 energy, tech and finance bigwigs released in conjunction with the event concluded that AI is set to transform the energy business by boosting efficiency and cutting greenhouse-gas emissions.
š3
Decagon and OpenAI deliver high-performance, fully automated customer support at scale
Launched in 2023, Decagonā (opens in a new window) has quickly become a key player in automating customer support for companies like Curology, BILT, Duolingo, Eventbrite, Notion, and Substack. OpenAIās models are crucial in their ability to deliver fast, reliable responsesāwithout human intervention.
From enterprises to tech-forward startups, Decagon helps businesses globally handle millions of support conversations without sacrificing quality or speed. The company uses a combination of OpenAIās modelsāincluding GPT-3.5, 4, 4o, 4 Turbo, and OpenAI o1-miniāto deliver agentic bots that go beyond response generation and service the entire customer lifecycle.
Launched in 2023, Decagonā (opens in a new window) has quickly become a key player in automating customer support for companies like Curology, BILT, Duolingo, Eventbrite, Notion, and Substack. OpenAIās models are crucial in their ability to deliver fast, reliable responsesāwithout human intervention.
From enterprises to tech-forward startups, Decagon helps businesses globally handle millions of support conversations without sacrificing quality or speed. The company uses a combination of OpenAIās modelsāincluding GPT-3.5, 4, 4o, 4 Turbo, and OpenAI o1-miniāto deliver agentic bots that go beyond response generation and service the entire customer lifecycle.
š1
OpenAI and the Lenfest Institute AI Collaborative and Fellowship program
The Lenfest Institute for Journalism, a leader in developing solutions for the next era of local news, on Tuesday announced a major new collaboration with OpenAI and Microsoft Corp. to help newsrooms explore and implement ways in which artificial intelligence can help drive business sustainability and innovation in local journalism through the Lenfest Institute AI Collaborative and Fellowship program.
In the initial round of funding, Chicago Public Media, Newsday (Long Island, NY), The Minnesota Star Tribune, The Philadelphia Inquirer, and The Seattle Times will each receive a grant to hire a two-year AI fellow to pursue projects that focus largely on improving business sustainability and implementing AI technologies within their organizations. The fellowship will also provide OpenAI and Microsoft Azure credits to help these publications experiment and develop tools to assist with local news.
The Lenfest Institute for Journalism, a leader in developing solutions for the next era of local news, on Tuesday announced a major new collaboration with OpenAI and Microsoft Corp. to help newsrooms explore and implement ways in which artificial intelligence can help drive business sustainability and innovation in local journalism through the Lenfest Institute AI Collaborative and Fellowship program.
In the initial round of funding, Chicago Public Media, Newsday (Long Island, NY), The Minnesota Star Tribune, The Philadelphia Inquirer, and The Seattle Times will each receive a grant to hire a two-year AI fellow to pursue projects that focus largely on improving business sustainability and implementing AI technologies within their organizations. The fellowship will also provide OpenAI and Microsoft Azure credits to help these publications experiment and develop tools to assist with local news.
HTML Embed Code: