The Green Deal, Long PDFs, and Simple Questions

The European Green Deal is a big deal. It’s the EU’s plan to make Europe climate-neutral by 2050.

But to understand how it all works, you need to go through a lot of official documents — strategy papers, action plans, and long reports. Most of them are in PDF format and not exactly easy to read.

Now imagine you just want to ask something like: “Which sectors have made the most progress?” or “How does the EU plan to support innovation?”

Good luck finding that quickly. Even with search, you still have to scroll, read, and piece everything together yourself.

The Problem: Public but Hard to Use

The EU does a great job making documents public. That’s not the problem. The problem is: they’re not written for easy access. They’re long, formal, and not made for people who just want clear answers. In this project, I worked with three real EU documents:

These are official sources, but they’re not easy to use. You can search for words — sure — but you won’t get real answers to real questions.

The Solution: AI That Understands Documents

What if you could ask these documents a question — and get a clear answer, with a source? That’s what I tried to build.

This project uses something called RAG — Retrieval-Augmented Generation. It’s a mix of search and AI text generation. Here's the basic idea:

  1. Split the documents into small chunks of text
  2. Turn each chunk into a vector using Gemini embeddings
  3. Store all those vectors in a FAISS index (so we can search them quickly)

When you ask a question - following steps are going to happen:

  1. We find the most relevant chunks
  2. Pass them (as context) to Gemini Pro
  3. Get back a natural answer — plus source info

Here’s a simplified version of what the code does:

# Step 1: Turn chunks into embeddings
embedding = genai.embed_content(
    model="models/embedding-001",
    content=chunk,
    task_type="retrieval_document"
)

# Step 2: Search similar chunks
query_vector = get_google_embedding(user_question)
distances, indices = index.search(query_vector, top_k=5)

# Step 3: Ask Gemini with the top matching chunks
response = gemini.generate_content(
    f"Based on this context:\n{context}\n\nAnswer the question:\n{user_question}"
)

The result? You get something like this:

Q: Does Green Deal includes also households in EU? A: The plan focuses on four main pillars: a predictable regulatory environment, faster access to funding, skills development, and open trade. Sources: Industrial Plan 2023, page 4

So instead of searching blindly through dozens of pages, you ask — and the model responds with something useful, clear, and based on actual EU documents.

✅ Gen AI Capabilities Used

This project demonstrates several core capabilities of generative AI:

Capability Description
Document Understanding Processes long, structured policy documents to extract relevant information.
Embeddings Each chunk of text is turned into a vector using the Gemini API (embedding-001).
Retrieval-Augmented Generation (RAG) Combines similarity search with text generation, ensuring grounded answers.
Vector Search / Vector Store (FAISS) Uses FAISS to quickly find the most relevant chunks of information.
Grounding The model answers only based on provided context — no hallucinations.

Together, these techniques allow users to interact with official documents like the EU Green Deal using natural language — and get clear, trustworthy answers.

Want to try it yourself? You can explore the notebook on Kaggle or fork it for your own documents.