Code&Data Insights

[Generative AI] Generative AI | Capabilities of Generative AI 본문

Artificial Intelligence/Natural Language Processing

[Generative AI] Generative AI | Capabilities of Generative AI

paka_corn 2024. 3. 23. 23:25

Generative AI

: 트레이닝 데이터를 기반으로 다양한 포맷(text, images, code, audio, video)의 새로운 컨텐츠를 생성해주는 AI 모델. 

Generative AI models are trained on substantial datasets of existing content and learn to generate new content similar to the data they were trained on.

 

- Generative AI can start from the prompt,

prompt can be Text/ Image/ Video/etc and then,

it generates a new content as text/images/audio/video/code/data. 

 

- we use generative AI tools 

for increase productivity / add tangible value to work / save money / maximize the brand value

 

- building blocks of generative AI include GANs, VAEs, transformers, and diffusion models

 

 

Generative AI Models

Large Language Models : Process and generate text

 

1) ChatGPT- text generation

2) DALL-E- image generation

3) Synthesia - video generation

4) Copilot - code generation

 

 

 

Capabilities of Generative AI

1) Text generation

LLMs

- generate human-like text

- learn patterns and structures from data sets

- generate coherent and contextually relevant

=> text completion, responses, conversation, explanations, summarization

=> Question answering, translation, image and text pairing, conversational interactions

 

(ex) ChatGPT(OpenAI), PaLM(Google)

 

2) Image generation

Leveraging Deep Learning techniques

- GANs, VAEs

=> StyleGAN : High quality resolution novel images

=> DeepArt : Complex and detailed artwork from a sketch 

=> Novel images based on textual descriptions

 

Applications

- Training data

- Medical imaging

- Scientific visualization

 

 

3) Audio generation

- Musical compositions

- Text-to-speech (TTS) audio

- Synthetic voices

- Natural-sounding speech

=> WaveGAN : raw audio waveforms , realistic sounds

=> OpenAI's MuseNet : original music in various genres and instrumentations

=> Google's Tacotron2, Mozilla TTS : Advanced TTS, Highly realistic synthetic speech(tone, pitch, pronounciation)

 

 

4) Video generation

- Create basic animations to complex scenes

- Transform images into dynamic videos

- Incorporate temporal coherence

- Exhibit smooth motion and transitions

=> Textual prompt-based new videos

=> Users can specify desired content

=> Users guide video generation process

 

 

5) Code generation

- Code Snippets

- Functions

- Complete programs

- debug

=> Gihub Copilot, IBM Watson Code Assistant

 

 

6) Data generation and augmentation

- Generate new data(samples)

=> Augment existing datasets (image/text/tabular data/statistical distribution)

=> Increase the diversity and variability of data

 

 

7) Virtual worlds

- Virtual avatars simulating (realistic behaviours)

- Complex virtual environment (realistic textures)

=> Metaverse

 

 

 

Reference : [IBM|Coursera] Generative AI: Introduction and Applications

Comments