[INST]
You are an expert in South American indigenous languages.
Use strictly and only the information below to answer the user question in **English**.
- Do not infer or assume facts that are not explicitly stated.
- If the answer is unknown or insufficient, say \"I cannot answer with the available data.\"
- Limit your answer to 100 words.
### CONTEXT:
{chr(10).join(context)}
### RDF RELATIONS:
{chr(10).join(rdf_facts)}
### QUESTION:
{user_question}
Answer:
[/INST]"""
try:
res = requests.post(
ENDPOINT_URL,
headers={"Authorization": f"Bearer {HF_API_TOKEN}", "Content-Type": "application/json"},
json={"inputs": prompt}, timeout=60
)
out = res.json()
if isinstance(out, list) and "generated_text" in out[0]:
return out[0]["generated_text"].replace(prompt.strip(), "").strip(), ids, context, rdf_facts
return str(out), ids, context, rdf_facts
except Exception as e:
return str(e), ids, context, rdf_facts
# === MAIN APP ===
def main():
methods, embedder = load_all_components()
st.markdown("""
""", unsafe_allow_html=True)
with st.expander("๐ **Overview**", expanded=True):
st.markdown("""
This app provides **AI-powered analysis** of endangered indigenous languages in South America,
integrating knowledge graphs from **Glottolog, Wikipedia, and Wikidata**.
\n\n*This is version 1 and currently English-only. Spanish version coming soon!*
""")
with st.sidebar:
st.markdown("### ๐ Pontificia Universidad Catรณlica del Perรบ")
st.markdown("""
- Departamento de Humanidades
- jveraz@pucp.edu.pe
- Suggestions? Contact us
""", unsafe_allow_html=True)
st.markdown("---")
st.markdown("### ๐ Quick Start")
st.markdown("""
1. **Type a question** in the input box
2. **Click 'Analyze'** to compare methods
3. **Explore results** with expandable details
""")
st.markdown("---")
st.markdown("### ๐ Example Queries")
questions = [
"What languages are endangered in Brazil?",
"What languages are spoken in Perรบ?",
"Which languages are related to Quechua?",
"Where is Mapudungun spoken?"
]
for q in questions:
if st.markdown(f"{q}
", unsafe_allow_html=True):
st.session_state.query = q
st.markdown("---")
st.markdown("### โ๏ธ Technical Details")
st.markdown("""
- Embeddings Node2Vec vs. GraphSAGE
- Language Model Mistral-7B-Instruct
- Knowledge Graph RDF-based integration
""", unsafe_allow_html=True)
st.markdown("---")
st.markdown("### ๐ Data Sources")
st.markdown("""
- **Glottolog** (Language classification)
- **Wikipedia** (Textual summaries)
- **Wikidata** (Structured facts)
""")
st.markdown("---")
st.markdown("### ๐ Analysis Parameters")
k = st.slider("Number of languages to analyze", 1, 10, 3)
st.markdown("---")
st.markdown("### ๐ง Advanced Options")
show_ctx = st.checkbox("Show context information", False)
show_rdf = st.checkbox("Show structured facts", False)
st.markdown("### ๐ Ask About Indigenous Languages")
query = st.text_input(
"Enter your question:",
value=st.session_state.get("query", ""),
label_visibility="collapsed",
placeholder="e.g. What languages are spoken in Peru?"
)
if st.button("Analyze", type="primary", use_container_width=True):
if not query:
st.warning("Please enter a question")
return
col1, col2 = st.columns(2)
for col, (label, method) in zip([col1, col2], methods.items()):
with col:
st.markdown(f"#### {label} Method")
st.caption({
"InfoMatch": "Node2Vec embeddings combining text and graph structure",
"LinkGraph": "GraphSAGE embeddings capturing network patterns"
}[label])
start = datetime.datetime.now()
response, lang_ids, context, rdf_data = generate_response(*method, query, k, embedder)
duration = (datetime.datetime.now() - start).total_seconds()
st.markdown(f"""
{response}
โฑ๏ธ {duration:.2f}s
๐ {len(lang_ids)} languages
""", unsafe_allow_html=True)
if show_ctx:
with st.expander(f"๐ Context from {len(lang_ids)} languages"):
for lang_id, ctx in zip(lang_ids, context):
st.markdown(f"{ctx}
", unsafe_allow_html=True)
if show_rdf:
with st.expander("๐ Structured facts (RDF)"):
st.code("\n".join(rdf_data))
st.markdown("---")
st.markdown("""
๐ Note: This tool is designed for researchers, linguists, and cultural preservationists.
For best results, use specific questions about languages, families, or regions.
""", unsafe_allow_html=True)
if __name__ == "__main__":
main()