Semantic Search Engine for Expert Witnesses

Loading 3D scatter plot...

Project Overview

As part of my role at Advice Company, I re-engineered the search experience for ExpertPages.com, an online directory of expert witnesses. The main technical challenge involved integrating modern NLP algorithms into a legacy website, which required a mix of backend data processing, cloud deployment, and custom front-end design.

The clients for this website are expert witnesses—professionals in medicine, law, business, and technology who testify in court cases involving their expertise. Attorneys rely on websites like ours to connect with expert witnesses based on the specific needs of a given case. Our old search system depended on experts classifying themselves into predefined categories, but this often failed to surface relevant or intuitive results. My new solution used a deep learning model to capture the unique characteristics of each expert and match them with an attorney's specific requirements.

The engine is now used by hundreds of attorneys each month and has become a key selling point for attracting and retaining clients on ExpertPages! Since its launch last November, I've continued to improve its accuracy, manage its cloud infrastructure, and iterate on its design based on user feedback.

Technical Implementation

To build the engine, I first wrote a Python script to generate a short summary of each expert based on available metadata, and used a pre-trained BERT embedding model to represent each expert as a high-dimensional vector:

1# generate summaries via metadata & GPT-3.5:
2def generate_expert_summaries(df):
3 summaries = []
4 for _, row in df.iterrows():
5 prompt = f"""Create a concise summary of this expert witness:
6 Name: {row['name']}
7 Field: {row['field']}
8 Experience: {row['experience']}
9 Education: {row['education']}
10 Keep it under 100 words."""
11
12 response = openai.ChatCompletion.create(
13 model="gpt-3.5-turbo",
14 messages=[{"role": "user", "content": prompt}],
15 max_tokens=150
16 )
17 summaries.append(response.choices[0].message.content)
18 return summaries
19
20# generate embeddings via transformer model:
21def generate_embeddings(summaries):
22 model = SentenceTransformer('all-MiniLM-L6-v2')
23 embeddings = model.encode(summaries, show_progress_bar=True)
24 return embeddings

Then, I created a program that takes in a query—either a description or keywords—and finds the most relevant experts using cosine similarity with the precomputed embeddings:

1def find_top_matches(query, df, top_n=10):
2 # call previous function to get query embedding from API:
3 query_embedding = get_query_embedding(query)
4
5 # normalize query embedding:
6 query_embedding = query_embedding / np.linalg.norm(query_embedding)
7
8 # stack and normalize the precomputed embeddings:
9 summary_embeddings = np.vstack(df['summary_embedding'].values)
10 summary_embeddings = summary_embeddings / np.linalg.norm(summary_embeddings)
11
12 # compute cosine similarity between query and precomputed embeddings:
13 similarities = np.dot(summary_embeddings, query_embedding)
14
15 # apply the modifier to boost the similarity score:
16 similarities *= df['modifier'].values

After optimizing and testing the backend, I deployed the engine using Docker and Google Cloud Run. This allowed it to be accessed via an API, which I integrated onto the existing website using custom JavaScript:

1// send the query
2function sendQuery() {
3 const query = document.getElementById("queryInput").value.trim();
4 const loadingElement = document.getElementById("loading");
5 const responseElement = document.getElementById("api-response");
6
7 if (!query) {
8 responseElement.innerHTML = "Please enter a query.";
9 return;
10 }
11
12 loadingElement.style.display = "block";
13 responseElement.innerHTML = "";
14
15 fetch(serviceUrl, {
16 method: 'POST',
17 headers: { 'Content-Type': 'application/json' },
18 body: JSON.stringify(payload),
19 })

And then designed the front-end UI with custom CSS and HTML:

1<!-- Search Interface -->
2<div class="search-container">
3 <div class="search-box">
4 <input
5 type="text"
6 id="searchInput"
7 placeholder="Describe the expert you&apos;re looking for..."
8 class="search-input"
9 />
10 <button onclick="searchExperts()" class="search-button">
11 Search
12 </button>
13 </div>
14 <div id="results" class="results-container"></div>
15</div>
16<script>
17async function searchExperts() {
18 const query = document.getElementById('searchInput').value;
19 const resultsDiv = document.getElementById('results');
20
21 try {
22 const response = await fetch('/api/search', {
23 method: 'POST',
24 headers: { 'Content-Type': 'application/json' },
25 body: JSON.stringify({ query })
26 });
27
28 const experts = await response.json();
29
30 resultsDiv.innerHTML = experts.map(expert => `
31 <div class="expert-card">
32 <h3>${expert.name}</h3>
33 <p class="field">${expert.field}</p>
34 <p class="summary">${expert.summary}</p>
35 <div class="match-score">
36 Match: ${(expert.similarity * 100).toFixed(1)}%
37 </div>
38 </div>
39 `).join('');
40 } catch (error) {
41 resultsDiv.innerHTML = 'Error searching for experts';
42 }
43}