DevLog 6-3

DevLog 6-3: Final Project Research and Planning

Initial Research Links

Potential Topics

Education x AI: Build a productivity tool that reimagines how students interact with AI and take notes at the same time. This tool will be a canvas like Excalidraw where students can use AI-native features to build up their sketches.
Entertainment x AI: Users interact with a web app through audio and video. The app either narrates what it sees or transforms the video in some other way in real-time.
Office x AI: Creating web agents that replicate certain actions users take on the web.
Sports x AI: Reimagining how athletes can train with AI. Either a vision and language model that monitors games and makes key decisions, or monitors trainings and coaches athletes.

Excalidraw - Open-source collaborative whiteboard with a hand-drawn feel. Inspires the canvas-based AI note-taking idea.
Mixboard - AI-powered concepting board for exploring and refining ideas visually. Shows how AI can augment creative brainstorming in a canvas UI.
Pixellot Coaching - AI-powered sports video analysis platform that auto-tracks players and generates coaching insights. Directly related to the Sports x AI topic.

New Skills and Technologies

GPT, Claude, and Gemini for AI tooling
AWS for hosting microservices and database
Nova-Act for web-based agents
Fal.ai for image and video generation

Summary

I’m exploring AI-powered interactive web experiences using multimodal LLMs (Gemini, Claude, GPT), Fal.ai for media generation, and AWS infrastructure. My focus is building an experimentive project that explores AI and its impact on a separate discipline. The main challenge is integrating multiple modalities (text, vision, audio) into a smooth, low-latency experience.

Random Ideas from an Idea Generator

A browser extension that watches your screen and generates a live comic strip of your workday.
An AI referee that watches pickup basketball games through a phone camera and calls fouls in real time.
A collaborative whiteboard where every stroke you draw gets auto-completed by AI into a different art style.

Concerns About the Final Project

My primary concern is being able to connect my frontend stack with all the AI tools that I will be calling via the API. I am concerned about multiple modalities such as image, text, or audio and how to get these things to be compatible for a smooth user experience together.