Deep Seeking – Writing in a Whole New Model

How Did DeepSeek Catch Up at a Fraction of the Cost?

Like everyone else, I was more than a little surprised by Deep Seek. I’d been plugging along using the other models and tools for artificial intelligence writing, planning, and crafting. I also still think the best writing tool is my brain and my hands, but there you have it.

Still, I teach and explore the world of AI and was fascinated by what just happened in the world of Large Language Models and the boom-snap of Deep Seek and its ability to tank the Nvidia market overnight. So, as one of the limited pool of tech writers paying attention to how these things take place, I figured I’d give a brief explanation and share a few thoughts here, with a nod to the folks who taught me a thing or two along the way.

DeepSeek’s leap comes down to four major innovations (and some smaller ones). Here’s the lowdown:

  1. They distilled from a Leading Model, for starters
    DeepSeek likely distilled its model from an existing one—most likely Meta’s Llama 3, though they could have accessed OpenAI’s GPT-4 or Anthropic’s Claude. Distillation involves training a new model using an existing one, much like OpenAI’s GPT-4 Turbo, which provides solid performance at lower costs by leveraging GPT-4 as a teacher.
    This approach slashes the time and cost of creating a training set, but it has limits. Since your starting point is always someone else’s previous release, leapfrogging the competition becomes far more challenging. If DeepSeek used OpenAI or Anthropic’s models, it would violate their terms of service—but proving that is notoriously difficult.
  2. Inference via Cache Compression
    DeepSeek slashed the cost of inference (the process of generating its responses) by compressing the cache that the model uses to make predictions. This breakthrough was clever but it wasn’t entirely unexpected. It’s a technique that others probably would have figured out pretty soon. More importantly, DeepSeek published the method openly, so now the entire industry can benefit from their efforts.
  3. Mixture of Experts Architecture
    Unlike traditional LLMs, which load the entire model during training and inference, DeepSeek adopted a “Mixture of Experts” approach. This uses a guided predictive algorithm to activate only the necessary parts of the model for specific tasks.
    DeepSeek needs 95% fewer GPUs than Meta because, for each token, they train only 5% of the parameters. This innovation radically lowers costs and makes the model far more efficient.Cheaper
  4. Reasoning Model Without Human Supervision
    DeepSeek didn’t just match leading LLMs like OpenAI’s GPT-4 in raw capability; they also developed a reasoning model on par with OpenAI’s o1. Reasoning models combine LLMs with techniques like CoT, enabling them to correct errors and make logical inferences—qualities predictive models lack.
    OpenAI’s approach relied on reinforcement learning guided by human feedback. DeepSeek trained their model on math, code, and logic problems, using two reward functions—one for correct answers and one for answers with a clear thought process. Instead of supervising every step, they encouraged the model to try multiple approaches and grade itself. This method allowed the model to develop reasoning capabilities independently.

The TL;DR 

  1. Innovation Beats Regulation
    DeepSeek’s success is a reminder that competition should focus on technological progress, not regulatory barriers. Innovation wins.
  2. Lower Costs All Around
    DeepSeek’s architecture dramatically reduces the cost of running advanced AI models—in both dollars and energy consumption. Their open-source, permissively licensed approach means the whole industry, including competitors, benefits.
  3. Models Are Becoming Commodities
    As models become cheaper and more widely available, the value will shift toward building smarter applications rather than simply integrating a flashy “AI button.” Product managers who can creatively apply AI will drive the next wave of real-world impact.
  4. NVIDIA’s Moat Is Shrinking
    DeepSeek’s GPU efficiency challenges assumptions about NVIDIA’s dominance. If others adopt similar techniques, the grip of chipmakers on the AI economy may loosen significantly.

*For some DeepSeek FAQs from folks who know a lot more than me, go here

Four Ways Gen AI Will Help Tech Writers in 2024

Image courtesy of Nick Morrison on Unsplash

We’re a few weeks into 2024 and now that we’re past the “best of” lists and the “year in review” lists, there’s a bit of time to take a look at what we might find most productive in the coming quarters. I always enjoy taking a few deep breaths once the frenzy of the year-start fades to look at what I might find truly useful as the daffodils peek their heads up in the park and I begin to feel the true momentum of the year take shape. I kick off the winter blues and roll my sleeves up to dig in.

This year, my company is more sure than ever that we will harness the ever-growing powers of AI as it enters a more mature phase, and I for one am grateful for the embracing of that stance. Artificial Intelligence, especially generative AI, is no longer just a buzzword. Companies are beginning to sift through the hype to discover what is really providing value and what was just marketing hubris. I’m glad of it. I suppose I am glad because I work for a company that knows the difference and tends to put their weight behind it. Sure, we have some applications that are neither artificial nor intelligent, but we own up to the ones that are data and machine learning, and we boast about the ones that have weight to throw around. And that is fun and ambitious.

So I enjoy rolling up my sleeves to explore the ways that Generative AI might improve my work, and I enjoy tossing out the ways that it is a big, fat dud. So let’s look at the four ways I think it might actually improve what I do.

Drafting

First drafts are the things that take up the bulk of any writer’s time, whether tech writer or fiction writer. We know what it is that we want to say, we just aren’t sure how to get started. It’s not writer’s block per se, it’s just the words in our brains don’t spill out on to the page as quickly as we’d like or as smoothly as they should. Our years of training are well suited to editing and refining, which we are likely to have already begun in our heads. It’s the typing that can be painful because it simply isn’t as fast as our thought processes.

But a Generative AI tool is as fast. Faster, even. If we can spin up a good prompt, we can get some of the process started, and that is often enough to grease the wheels and we can take it from there.

Most likely the way we will be able to effectively harness what Gen AI really has to offer is to become adept at crafting and refining the prompts to get us started. I began writing chapters of my book recently by asking ChatGPT to help me organize the table of contents, then I asked it to revise that TOC because it wasn’t quite the way I liked it. The actual TOC looks nothing like it did when good ol’ AI first started it, but what was nice is that it gave me liftoff. I knew in my head what I wanted to do, but the way my creative brain was working was to think of the book as the whole, not its parts. That is awful for a writer. As the saying goes, “you can’t boil the ocean.” Indeed you can’t. And while AI wrote not a single word of this post, that’s because I have been able to think of its component parts. With my book, I was thinking of the whole thing, start to finish, and I needed a hand in visualizing what the chapters might look like. By asking AI to help me out with that, I was able to think of the pieces rather than the whole. It was only then that I could sit down and begin to write.

With that in mind, we as creators will be able to ask Gen AI not to write for us, but to draft for us, so that we can get out of the muck that is stuck in our brains and begin to refine and edit and revise after the foundation is laid. What is left behind as detritus will be much like what we left behind us when we were drafting ourselves – an unrecognizable skeleton of tossed aside word salads. It requires our fine touch to render the work useful and beautiful. Gen AI will have helped us get there faster, much the same way moving from horse to car allowed us to move from place to place more quickly, too.

Automating Q&A

As a tech writer, I cannot even begin to tell you how many times I have been asked to write an FAQ section for documentation.

My answer to this request is always the same. I will not.

I will not write an FAQ because if there are frequently asked questions, I should answer those within the documentation. Sheesh!

And yet.

There remain some questions that indeed, users will face time and again that are unlikely to be answered in the documentation. I get it. No matter how much of a purist I am, I understand that there are frequently asked questions that need frequent answers.

So a technical writer can collate those questions and help develop the chatbot that integrates with a product’s or company’s knowledge repository to answer those repetitive questions effectively. This bot can instantly provide responses in a way that no live writer could, through a call-and-response mechanism that is smoother than static documentation. By leveraging AI, we can ask our tools to ingest data and interpret our users’ needs and provide output that is both relevant and accurate. This can enhance the user’s experience by responding immediately whereas a human might require time to look something up, or might be off-hours or otherwise detained.

AI-powered chatbots are efficient, interactive, and fast. But they serve only after technical writers feed them all they need in terms of information. It’s vital to remember that our AI tools are only as good as the data they are given to start with.

Data

Technical writers will also be very well served by the data that AI can give us. Back when I was in graduate school, we worked very hard to manually enter volumes of information to build dictionaries of strings of phrases and what they meant in general syntax. It was difficult and time consuming. Today, though, thanks to AI-powered tools, we could have completed that project in a fraction of the time.

How AI data will benefit tech writers is that it will allow us to gather truly valuable insights into our users’ behaviors and we can stop making what we once knew were educated guesses about their pathways. Tools we have now measure user engagement, search pattern behaviors, and even the effectiveness of our content through click rates, bounce patterns, andmore, and they serve it all up in nice, tidy numbers. We can use this to optimize our content and identify the gaps we see and close them quickly.

Leveraging analytics data allows tech writers now to focus on the content our users access often, the content that interests them the most, and the keywords they search – and do or do not find – with greatest success. Plus, we can now see just what stymies our users and all of this will empower us to create the most targeted content that will greatly improve the user experience.

Optimization

More than just a buzzword, content optimization is probably the most powerful benefit of AI-enabled tech writing. Asking a good AI tool to analyze technical documentation to provide feedback about readability against a style guide and structural rules is probably the best thing we could ask for. Sure, we’ve got things like grammarly and acrolinx to do spot-checking. That’s all well and good. But now AI can help us enhance accessibility, it can help us tailor content for specific audiences, it can allow or disallow entire phrases and terms. We can advance our proofreading capabilities far more than we could before by loading in custom dictionaries and guides.

Tools like grammarly are basic, but with advancing NLP processing tools, the arsenal we have just grows. We can pass our work through a domain-specific tool that offers a profession-specific set of guidelines and lets us examine with a microscope things like acronyms, codes, and more.

The time savings there is not perfect, of course. We’ll still need to copyedit. But imagine, if you will, that every first or second pass of all of our text now is done in mere seconds. All typos are caught by AI so that we can clean up and polish with an editor’s eye not just a proofreader’s.

Sit back with a cup of coffee and really read, not just grab a pencil and mark up a piece of paper the way we used to.

My, that’s nice.

I’m Your Writer, Not Your Everything

Or, Hey there, Have You Seen My SME?

I couldn’t even think of a cool graphic or photo to go with this, so here we are, graphic-less. That’s probably good, because I am not starting off the year on a happy, optimistic note to be honest.

Let’s get one thing clear. I am a SME. I am absolutely, without a doubt a subject matter expert. One hundred per cent. Although I spend most of my time writing about a software product in the area of clinical trials data analytics, I am decidedly not an expert in clinical trials data analytics. Nor was I hired to be that.

I am a subject matter expert in technical writing, in grammar and syntax.

I am supposed to be just that. I deliver on that promise every day. I rarely make an error in my delivery of high-quality content. I know what readers need and how they need it to be organized. I understand where the reader’s eye is likely to go and what concepts should be placed where. More importantly, I can spot a typo in difficult language and if I am unfamiliar with a term I know where to go to look it up, when to insist on that word, and when to choose another. I build dictionaries and document frameworks. I organize tables of information. My spelling is almost perfect and so is my grammar.

What I am not is a subject matter expert in data analytics. Do you want to know a secret? I was not a subject matter expert in mainframe banking software when I was a technical writer for my former company, either. And I was not a subject matter expert in cybersecurity when I worked in that field. Are you sensing a pattern or is it just me?

Now is the moment when I sigh.

The burden on technical writers is becoming ever steeper. Our teams are becoming thinner and our demands are becoming heavier. We have always carried the weight of justifying our positions when budgets tighten, showing our value since our deliverable often doesn’t bring with it a 1:1 ROI, and showing mid- and upper-leadership that continued investment in upskilling is essential. As the asks of today’s software teams become greater and the speed with which we deliver becomes faster, demonstrating those values seems to be tougher and tougher.

I generally work with great people and teams and on balance the work that I do is seen as necessary if not critical to the end product. But more and more I learn of teams that want their tech writers to be so fully embedded in the end-to-end development that the lines between coder and ux designer and writer are not just comfortably blurred, they are erased.

At the end of the day, all I know for certain is this: I will remain a subject matter expert in grammar, syntax, and spelling and I will feel quite comfortable in that area. When I am called upon to be a soup to nuts pro in a complex piece of software, I think I’ll start asking my development teams to explain to me the differences between a gerund and a participle, and to recommend when I should use each in the installation manual I am working on today.

Singing a New Tune in AI – Prompt Tuning

We are all well aware that AI is the hottest topic everywhere. You couldn’t turn around in 2023 without hearing someone talk about it, even if they didn’t know what the heck it was, or is. People were excited, or afraid, or some healthy combination of both.

From developers and engineers to kids and grandmas, everyone wanted to know a thing or two about AI and what it can or cannot do. In my line of work, people were either certain it would take away all of our jobs or certain that it was the pathway to thousands of new jobs.

Naturally I can’t say for certain what the future holds for us as tech writers, but I can say this – we as human beings are awful at predicting what new technologies can do. We nearly always get it wrong.

When the television first arrived, there were far more who claimed it was a fad than those who thought it would become a staple of our lives. The general consensus was that it was a mere flash in the pan, and it would never last more than a few years. People simply couldn’t believe that a square that brought images into our homes would become a thing that eventually brought those images to us in every room of our homes, twenty four hours a day, offering news and entertainment, delivering everything we needed all day and all night. They couldn’t fathom that televisions would be so crystal clear and so inexpensive that every holiday season the purchase of a bigger, better, flatter, thinner television would be a mere afterthought.

And yet here we are.

So now that we’ve got that out of the way, on to total world domination!

But seriously.

If you aren’t already using AI, or at least Gen AI in the form of something like Chat GPT, where are you, even? At least have a little play around with the thing. Ask it to write a haiku. Let it make an outline for your next presentation. Geez, it’s not the enemy.

In fact, it’s so much not the enemy that it can help you outline your book (like I’ve done), revise a paragraph (like I’ve done), or tweak your speech (like I have done many, many times). The only thing you really need to understand here is that you are, indeed, smarter than the LLM. Well, mostly.

The LLM, or large language model, does have access to a significantly grander corpus of text than you can recall at any given moment. That’s why you are less likely to win on Jeopardy than if you were to compete against it. It’s also why it might be true that an LLM competitor might make some stuff up, or fill in some fuzzy details if you ask it to write a cute story about your uncle Jeffrey for the annual Holiday story-off. (What? Your family does not actually have an annual story-off? Well, get crackin’ because those are truly fun times…fun times…). The LLM knows nothing specific about your uncle Jeffrey, but does know a fair bit about, say, the functioning of a carburetor if you need to draft a paragraph about that.

The very, very human part is that you must have expertise in how to “tune” the prompt you offer to the LLM in the first place. And the second place. And the third place!

Prompt tuning is a technique that allows you to adapt LLMs to new tasks by training a small number of parameters. The prompt text is added to guide the LLM towards the output you want, and has gained quite a lot of attention in the LLM world because it is both efficient and flexible. So let’s talk more specifically about what it is, and what it does.

Prompt tuning offers a more efficient approach when compared to fine tuning entirety of the LLM. This results in faster adaptation as you move along. Second, it’s flexible in that you can apply tuning to a wide variety of tasks including NLP (natural language processing), image classification, and even generating code. With prompt tuning, you can inspect the parameters of your prompt to better understand how the LLM is guided towards the intended outputs. This helps us to understand how the model is making decisions along the path.

The biggest obstacle when getting started is probably designing an effective prompt at the outset. To design an effective prompt, it is vital to consider the context and structure of the language in the first place. You must imagine a plethora of considerations before just plugging in a prompt willy-nilly, hoping to cover a lot of territory. Writing an overly complex prompt in hopes of winnowing it down later might seem like a good idea, but in reality what you’ll get is a lot of confusion, resulting in more work for yourself and less efficiency for the LLM.

For example, if you work for a dress designer that creates clothing for petite women and you want to gather specific insights about waist size, but don’t want irrelevant details like shoulder width or arm length and competing companies, you might try writing a prompt to gather information. The challenge is to write a broad enough prompt, asking the AI model for information about your focus area (petite dresses), while filtering out information that is unrelated and avoiding details about competitors in the field.

Good Prompt/Bad Prompt

Bad prompt: “Tell me everything about petite women’s dresses, sizes 0 through 6, 4 feet tall to 5 feet 4 inches, 95 lbs to 125 lbs, slender build by American and European designers, and their products XYZ, made in ABC countries from X and Y materials.”

This prompt covers too many facets and is too long and complex for the model to return valuable information or to handle efficiently. IT may not understand the nuances with so many variables.

A better prompt: “Give me insights about petite women’s dresses. Focus on sizes 0 to 6, thin body, without focusing on specific designers or fabrics.”

In the latter example, you are concise and explicit, while requesting information about your area of interest, setting clear boundaries (no focus on designers or fabrics), and making it easier for the model to filter.

Even with the second prompt, there is the risk of something called “overfitting,” which is too large or too specific. This will lead you to refine the prompt to add or remove detail. Overfitting can lead to generalization or added detail, depending on which direction you need to modify.

You can begin a prompt tune with something like “Tell me about petite dresses. Provide information about sizes and fit.” It is then possible to add levels of detail that the LLM may add so that you can refine the parameters as the LLM learns the context you seek.

For example, “Tell me about petite dresses and their common characteristics.” This allows you to scale the prompt to understand the training data available, its accuracy, and to efficiently adapt your prompt without risking hallucination.

Overcoming Tuning Challenges

Although it can seem complex to train a model this way, it gets easier and easier. Trust me on this. There are a few simple steps to follow, and you’ll get there in no time.

  1. Identify the primary request. What is the most important piece of information you need from the model?
  2. Break it into small bites. If your initial prompt contains multiple parts or requests, break it into smaller components. Each of those components should address only one specific task.
  3. Prioritize. Identify which pieces of information are most important and which are secondary. Focus on the essential details in the primary prompt.
  4. Clarity is key. Avoid jargon or ambiguity, and definitely avoid overly technical language.
  5. As Strunk and White say, “omit needless words.” Any unnecessary context is just that – unnecessary.
  6. Avoid double negatives. Complex negations confuse the model. Use positive language to say what you want.
  7. Specify constraints. If you have specific constraints, such as avoiding certain references, state those clearly in the prompt.
  8. Human-test. Ask a person to see if what you wrote is clear. We can get pretty myopic about these things!

The TL;DR

Prompt tuning is all about making LLMs behave better on specific tasks. Creating soft prompts to interact with them is the starting point to what will be an evolving process and quickly teaching them to adapt and learn, which is what we want overall. The point of AI is to eliminate redundancies to allow us, the humans, to perform the tasks we enjoy and to be truly creative.

Prompt tuning is not without its challenges and limitations, as with anything. I could get into the really deep stuff here, but this is a blog with a beer pun in it, so I just won’t. Generally speaking (and that is what I do here), prompt tuning is a very powerful tool to improve the performance of LLMs on very specific (not general) tasks. We need to be aware of the challenges associated with it, like the hill we climb with interpretability, and the reality that organizations that need to fine-tune a whole lot should probably look deeply at vector databases and pipelines. That, my friends, I will leave to folks far smarter than I.

Cheers!