Skip to content

DevLog 6-3

DevLog 6-3: Final Project Research and Planning

Section titled “DevLog 6-3: Final Project Research and Planning”
  • Education x AI: Build a productivity tool that reimagines how students interact with AI and take notes at the same time. This tool will be a canvas like Excalidraw where students can use AI-native features to build up their sketches.
  • Entertainment x AI: Users interact with a web app through audio and video. The app either narrates what it sees or transforms the video in some other way in real-time.
  • Office x AI: Creating web agents that replicate certain actions users take on the web.
  • Sports x AI: Reimagining how athletes can train with AI. Either a vision and language model that monitors games and makes key decisions, or monitors trainings and coaches athletes.
  • Excalidraw - Open-source collaborative whiteboard with a hand-drawn feel. Inspires the canvas-based AI note-taking idea.
  • Mixboard - AI-powered concepting board for exploring and refining ideas visually. Shows how AI can augment creative brainstorming in a canvas UI.
  • Pixellot Coaching - AI-powered sports video analysis platform that auto-tracks players and generates coaching insights. Directly related to the Sports x AI topic.
  • GPT, Claude, and Gemini for AI tooling
  • AWS for hosting microservices and database
  • Nova-Act for web-based agents
  • Fal.ai for image and video generation

I’m exploring AI-powered interactive web experiences using multimodal LLMs (Gemini, Claude, GPT), Fal.ai for media generation, and AWS infrastructure. My focus is building an experimentive project that explores AI and its impact on a separate discipline. The main challenge is integrating multiple modalities (text, vision, audio) into a smooth, low-latency experience.

  1. A browser extension that watches your screen and generates a live comic strip of your workday.
  2. An AI referee that watches pickup basketball games through a phone camera and calls fouls in real time.
  3. A collaborative whiteboard where every stroke you draw gets auto-completed by AI into a different art style.

My primary concern is being able to connect my frontend stack with all the AI tools that I will be calling via the API. I am concerned about multiple modalities such as image, text, or audio and how to get these things to be compatible for a smooth user experience together.