Ai conversations transcript9/20/2023 ![]() Name for download: supreme-corpus Wikipedia Talk Pages CorpusĪ medium-size collection of conversations from Wikipedia editors' talk pages. Name for download: parliament-corpus Supreme Court CorpusĪ collection of conversations from the U.S. Parliamentary question periods from May 1979 to December 2016 (216,894 question-answer pairs). Name for download: movie-corpus Parliament Question Time Corpus ![]() (220,579 conversational exchanges between 10,292 pairs of movie characters in 617 movies). Name for download: conversations-gone-awry-corpus (Wikipedia version) or conversations-gone-awry-cmv-corpus (Reddit CMV version) Cornell Movie-Dialogs CorpusĪ large metadata-rich collection of fictional conversations extracted from raw movie scripts. The other consists of discussion threads on the subreddit ChangeMyView (CMV) that derail into rule-violating behavior as determined by the presence of a moderator intervention (6,842 conversations containing 42,964 comments). One corpus consists of Wikipedia talk page conversations that derail into personal attacks as labeled by crowdworkers (4,188 conversations containing 30.021 comments). Two related corpora of conversations that derail into antisocial behavior. Alternatively you can access them directly here. These datasets can be downloaded using the convokit.download() helper function. DatasetsĬonvoKit ships with several datasets ready for use "out-of-the-box". Linguistic diversity in conversations (API)Ī method to compute the linguistic diversity of individuals within their own conversations, and between other individuals in a population.Įxample: speaker conversation attributes and diversity example on ChangeMyView CRAFT: Online forecasting of conversational outcomes (API)Ī neural model for forecasting future outcomes of conversations (e.g., derailment into personal attacks) as they develop.Īvailable as an interactive notebook: full version (fine-tuning + inference) or inference-only. Expected Conversational Context Framework (API)Ī framework for characterizing utterances and terms based on their expected conversational context, consisting of model implementations and wrapper pipelines.Įxamples: deriving question types and other characterizations in British parliamentary question periods,Įxploration of Switchboard dialog acts corpus, examining Wikipedia talk page discussions and computing the orientation of justice utterances in the US Supreme Court Hypergraph conversation representation (API)Ī method for extracting structural features of conversations through a hypergraph representation.Įxample: hypergraph creation and feature extraction, visualization and interpretation on a subsample of Reddit. Politeness strategies (API)Ī set of lexical and parse-based features correlating with politeness and impoliteness.Įxample: understanding the (mis)use of politeness strategies in conversations gone awry on Wikipedia. The toolkit currently implements features for: Linguistic coordination (API)Ī measure of linguistic influence (and relative power) between individuals or groups based on their use of function words.Įxample: exploring the balance of power in the U.S. Read our documentation or try ConvoKit in our interactive tutorial. The latest version is 3.0.0 (released July 17, 2023) follow the project on GitHub to keep track of updates. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets. This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a single unified interface inspired by (and compatible with) scikit-learn.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |