When you need to produce content quickly, it can be tempting to simply query ChatGPT or another LLM and ask it to do the work for you. But here’s a cautionary tale.
Today, I downloaded a white paper from a respected organization. It included “advice” from several industry consultants on topics important to the industry. As I browsed through the “advice,” a few items seemed all too familiar.
You see, I also belong to a group that discusses prompts you can use to get completely digital “advice” from LLMs on these very topics. We compare the results from the different systems, including ChatGPT-4 and Claude, which often provide very similar answers.
Sure enough, the advice provided by two of these consultants in the white paper was ALMOST IDENTICAL to some of the results from the model prompts we used in the other group.
You can interpret this in two ways.
First, the consultants’ published advice could have been used to train the LLM models and ranked high enough that they became the LLMs’ definitive responses on the subject matter… and if you believe this, I have some land in Florida to sell you.
Second, the consultants took the easy way out and had ChatGPT produce the responses for them.
Think of the implications.
Would you be willing to spend thousands of dollars with a consulting organization that relied on a tool you could easily use yourself to produce their advice? If you hired that organization, then checked their work against your generative AI tool, how would you feel if you saw nothing original in their work? Would you continue to trust them?
I’m not saying you shouldn’t use generative AI to help you create content.
ChatGPT and other generative AI tools can add a lot of efficiency and insight to your content creation, so you should take advantage of them.
So how can you best use them?
Ideation: Start with the results of your initial brainstorming, then ask the system to expand on your ideas, suggest alternatives, or add a list of similar concepts. This can improve your content while maintaining the originality of your ideas. Be sure to fact-check everything the bot produces… it generates text, not facts, and is prone to inaccuracy.
Content Creation: Include as much of your original content as possible in your prompt, ideally a detailed outline or rough draft. Ask the bot to write an article based on the information provided. For a different perspective, ask the bot to write an article contradicting the information you provided.
Editing: Iterate, alternating your manual edits with the bot’s. Consider asking it to change the tone or style, expand on certain concepts, or shorten others. Always conclude with your own editing pass to keep it “yours.”
Advanced tip: Use an advanced tool that can prioritize your own data and research when producing content. Microsoft’s targeted Copilots and other Retrieval Augmented Generation systems can produce content using large amounts of your information for source material.
If you use these tools wisely, you can improve the quality of your content, greatly reduce editing time, and enhance your reputation by providing high-quality ORIGINAL content.
Matrix Potential helps clients accelerate their growth by implementing Generative AI systems to address critical business needs. Click here to get started.
This article was human-produced. ChatGPT 4o was used only to edit for spelling, grammar, and punctuation. The title was suggested by ChatGPT.
Several years back, we were given several long lists of Korean addresses and contact information to import into our CRM system. The lists were a mess, with addresses in several different formats, unseparated fields, characters that had been converted to multi-character sets, and lots of near-duplicate data. And they were not formatted at all for import into a US-structured CRM system.
After examining the data, we spent days developing a series of scripts to iteratively attack the problems, one at a time. When we were done, almost 95% of the data was in a form that could be imported. The rest required manual intervention.
We were given many more projects. Different types of data. Similar problems. A customized solution for each dataset. Lots of labor hours.
With Generative AI and tools like ChatGPT the game has changed and projects like this can be easy. You can complete them in hours, not days, and with better accuracy than previous methods allowed.
Today we'll explore some easy methods for cleaning data. But first we'll discuss the risks involved in the process and how to mitigate them.
From this...
to this...
Risks and mitigation
Any time you send data to external services for processing you run the risk of that data being intercepted. With many free cloud services you may even inadvertently grant the host the right to use that data for a wide variety of purposes. And there's always the risk of bad actors masquerading as legitimate services to intercept your data. To reduce these risks, be sure to:
Have and enforce company policies for using data in the cloud, approving tools, certifying users, and approving use cases.
Identify and document the sensitivity of your data sets and manage access.
Prohibit the use of "free" tools and restrict your users to only using paid, licensed products. These usually have much more restrictive and controllable terms regarding the provider's use of input prompts and output generated by your users.
Assure that any tools you use effectively encrypt your data, both in transit and while stored.
Use tools that separate your raw data from the language model platform you are using. These are often referred to as Retrieval Augmented Generation (RAG) systems.
For highly critical applications, use self-contained tools that run directly on your device or network, without internet access.
Matrix Potential can help you navigate the risks of exposing your data and build secure tools for your use case. Get in touch with us here.
Turning poorly formatted data into tables
Here are some prompts you can use to clean up data. Depending on the tool you choose, you can use them iteratively and even build specific prompt sets for different types of data.
When collecting data from multiple sources or conducting research, the information likely won't come in a format you can simply import into your internal systems. Here's an example prompt to help you clean up poorly formatted contact lists. It can easily be applied to other types of data.
I'm going to provide a large amount of text. Convert this text into a table with the following columns: First Name, Last Name, Title, Company, Street Address, City, State, Zip Code, Email Address, Phone. Not all columns may be present for each row of data.
If the text contains a middle initial, include it in the First Name field, separated from the first name by a space character and followed by a period.
Zip codes may contain 5 or 9 digits. If they are 9 digits, a dash character should separate the first 5 digits from the next 4 digits. Zip codes should be stored as text, not numbers.
Include any data that doesn't fit these categories in a column labeled Other, with elements separated by commas.
This is an example for a contact list, but you can use a similar prompt for many other types of unformatted or poorly structured data.
Multiple steps
A very complicated prompt may generate errors, but you can break the process down into several steps to avoid confusing the system. In the example above, instead of specifying the columns "First Name" and "Last Name," and requesting special handling for middle initials, request only a "Name" field in the first prompt. In a second prompt, you could state:
Reformat the table above, separating the Name column into Salutation, First Name, Middle Initial, Last Name, and Suffix columns. Not all elements may exist for each name.
Supplementing the original data
You can also have the system analyze the data to create additional fields based on other content. In the example above, here's how you can analyze email addresses:
For the table above, add another column for Email Type. If the email address does not contain a valid URL or is missing, the value for this field should be "Invalid". If the email address contains gmail.com, outlook.com, icloud.com, yahoo.com, aol.com, or msn.com, set the value for this field to "Personal". All other rows should have the value "Business".
Substituting Characters
Sometimes, raw international data may come with multi-character strings representing accented characters or digraphs, or you may want to replace accented characters with their unaccented versions for easy import. (This is common when the imported text was encoded in ISO 8859-1 instead of UTF-8.)
A prompt like this can accomplish this task simply, and is much more efficient than an iterative search and replace. It usually works best to do this before parsing your data into a table.
Replace all instances of the text below as follows, ignoring quotation marks:
"á" with "á"
"é" with "é"
"ó" with "ó"
"Ã-" with "í"
"ñ" with "n"
"Á" with "A"
"ß" with "ss"
Complete all of one substitution before beginning the next on the list. All substitutions are case sensitive.
Preparing for export/import
When complete, you can extract the result for import into another system with a prompt similar to:
Create a tab-delimited text file from the above table.
or
Create a comma-delimited text file from the above table.
If you're going to require a comma-delimited text file to import data into your new system, prior to doing this you should strip out any remaining commas in your table prior to exporting so they will end up in the proper field when imported. Using the method above, a simple prompt like this can assure that your data will import properly.
In the table above, replace any remaining commas with the "|" character.
Do you regularly need to move data from a variety of sources into your database-driven systems? Matrix Potential's data science team can help you with projects of any size. Get in touch with us here.
We're now seeing MOST of our new projects involving a Generative AI component. Our clients are rightly concerned that if they don't jump in right now, their competitors will develop an advantage that will be difficult to overcome.
For content creation, customer support, competitive landscaping, business analysis, and more, Generative AI tools like ChatGPT can make your job easier and more efficient. With them, you can uncover information that you wouldn't easily find with traditional web or keyword searches.
By integrating your own data and research you can be even more effective, developing and synthesizing guided insights that are otherwise not apparent.
The reality is your competitors are using these tools today. To remain relevant, you should be actively considering ways you can improve your business operations with generative AI tools. And start testing some simple applications now.
If you're not sure where to get started, here are some ideas:
There are some risks.
Data security: Be cautious with the information you input into generative AI systems. This data could be used for any purpose, including training the AI. Your sensitive private data could become part of a future answer provided by the system to other users.
Examine your license agreements closely and opt for paid licenses with data use restrictions when you can. More importantly, be sure your employees are trained on the risks. Restrict the information they may provide to the tools and limit their use to your licensed products.
Accuracy of content: Generative AI can produce "hallucinations," or incorrect / false information. These systems predict words based on prompts and previous outputs, not verified facts.
See this cautionary story:
Always fact check everything provided in the output from the system, and use common sense. If you're going to be posting the content or using it with customers, is it accurate? Does it reflect the tone you want? Take some time for real editing before using the content externally.
Rapidly evolving technology: The generative AI landscape is rapidly changing. Tools you choose today might be outdated tomorrow.
To stay flexible, segregate your data from the systems you use. Then, you'll be able to plug it into a new tool much more easily when it becomes available. But remember, if you choose to wait, you'll only be farther behind when you decide to get started.
Matrix Potential has extensive experience with generative AI and machine learning systems. If you're ready to explore how generative AI can benefit your organization, click here.