Lessons learned integrating Gemini API
By Gemma Lara Savill
Published at August 28, 2024
My recent participation in the Gemini API Developer Competition has provided me with some insights into the capabilities and challenges of this powerful language model (LLM). Through trial and error, here is a combination of my lessons learned with a set of best practices that can help you effectively harness Gemini's potential. Here I share my experience and provide a comprehensive exploration of essential strategies to get you started and maximize your results.
Pricing
I was disappointed when I discovered that there is no free tier for the Gemini API if you are in Spain. This applies to the whole EU. Anyway, no other choice but to have a Google Cloud billing account and pay your way.
Given the cost of using the Gemini API, it's essential to implement strategies to minimize expenses. Here are some steps you can take to mitigate the cost:
- Maximize the benefits of using Google AI Studio. Since all prompt usage is free in the Google AI Studio console, I've adopted it for my setup, tuning, and testing processes.
- Fake data source: Once you have the Gemini API integration working in your app, consider using a fake data source as a placeholder for your real one. This allows you to develop your UI and other components independently of the LLM, using canned responses from the fake data source. By reducing API calls during development, you can streamline your workflow and save costs.
- Select the Gemini model that best suits your requirements while considering the cost. Google recently slashed Gemini Flash model prices by approximately 70%! Based on my experience, the Gemini 1.5 Flash model is well-suited for high-volume, low-latency applications like summarization, categorization, and multi-modal understanding. Here is a link to keep up with the latest models, https://deepmind.google/technologies/gemini/
System instructions are key
Getting answers that fit the use case of your project can be challenging.
An example: for my Android app entry to the Gemini API Developers Competition, I wanted Gemini to identify a bird species from a series of physical characteristics.
First problem: the different meanings of the word "bird" in the English language. It can have a negative interpretation.
To solve this, my first intent was to train the model, but I found you can get excellent results by defining clear system instructions.
1. Define a clear role for the model
The first and main one is to define a clear role for the model.
Once my model was given clear instructions, including Your job is to identify the species of bird that a user has encountered, You are a scientist, you love birds (animals) and nature in general and all answers only apply to birds (animals). Now I have no more security issues flagged due to the use of the word "bird".
Model role key points to consider:
- Be specific: The more specific you are in defining the model's role, the better it will be able to understand and complete its task.
- Provide examples: Giving the model examples of the kind of tasks it will be asked to perform can also help it understand its role.
- Set boundaries: It's important to set boundaries for the model's role. For example, you could say "You will not be able to provide information about birds that do not exist in the real world."
2. Define output instructions
To ensure consistent and usable responses from the Gemini model, it's crucial to specify a clear output format. While JSON and plain text are common options, the ideal format often depends on the specific prompt and desired use case.
For instance, I have been working with a single model and role, and I demand different JSON schema answers for various queries. However, it's essential to be mindful of potential inconsistencies. Sometimes my model becomes overly creative and alters the JSON schema names, this will result in parsing errors. To mitigate this, you can have a detection and mitigation strategy in your code, and resend the query with an explicit instruction, such as "Please ensure the answer is in this format: [desired JSON schema]."
This attention to output format is crucial for effective integration and data processing.
3. Define the Temperature
This setting will determine how "creative" your model gets. A high temperature will give you more varied and creative answers, and a low temperature will result in more predictable and repetitive outputs.
Key points to consider regarding temperature:
- Balancing creativity and accuracy: While a higher temperature can lead to more creative and diverse outputs, it can also increase the risk of generating incorrect or irrelevant information. Finding the right balance is crucial.
- Experimentation: Experimenting with different temperature settings can help you determine the optimal level for your specific use case.
- Task-specific adjustments: The ideal temperature may vary depending on the nature of the task. For example, a higher temperature might be suitable for creative writing, while a lower temperature might be better for factual queries.
You can find the Temperature setting on the right in Google AI Studio, under the model selector. And in your code, you can set it when you define your model.
4. Configure your Security Settings
This is a very important setting, as LLMs and safety is a very important topic.
The Gemini API allows you to define up to four security settings at the moment: Harassment, Hate speech, Sexually explicit, and Dangerous content categories.
Each one of these settings can be set to five different filter levels:
- unspecified: The threshold was not specified
- low and above: Content with negligible harm is allowed
- medium and above: Content with negligible to low harm is allowed
- only high: Content with negligible to medium harm is allowed
- none: All content is allowed regardless of harm
I recommend you start high and watch out for the "SAFETY" error coming from the API response: "Prompt was blocked due to safety reasons. Inspect safetyRatings to understand which safety category blocked it." If you get it too often, you can tune the category that is getting blocked.
Key considerations regarding security settings:
- Contextual understanding: While these settings can help filter harmful content, it's important to remember that LLMs are still learning and may not always understand the context of a prompt.
- Continuous monitoring: Regularly reviewing the "SAFETY" error messages can help you identify areas where the security settings may need adjustment.
- Human oversight: Even with robust security settings, human oversight remains essential for ensuring the ethical and responsible use of language models.
Conclusion
As the field of AI continues to evolve, LLMs like the Gemini API provide a valuable resource for developers looking to create cutting-edge applications. By carefully considering factors like pricing, model training, and security, you can effectively harness an LLM's power to build innovative and impactful solutions.
Staying informed, experimenting, and embracing a spirit of innovation, are key to contributing to the advancement of AI and creating groundbreaking applications that shape the future.