
Further, note that each filter has the same number of channels as the number of features (i.e. i-th to (i+k-1)-th words in the given sentence). element-wise multiplication and then summing all the results) with the extracted window of length k (i.e. Here is an illustration of what happens (here I have assumed k=3 and removed the bias parameter of each filter for simplicity):Īs you can see in the figure above, the response of each filter is equivalent to the result of its convolution (i.e.
#Joxi define windows
To do so, sliding windows of length k are extracted from the data and then each filter is applied on each of those extracted windows. Now we would like to apply a 1D convolution layer consisting of n different filters with kernel size of k on this data. Suppose we have a sentence consisting of m words where each word has been represented using word embeddings: I just use the example of a sentence consisting of words but obviously it is not specific to text data and it is the same with other sequence data and timeseries. I would try to explain how 1D-Convolution is applied on a sequence data. So I won't get to say 156-th word in the sentence, and thus this information will be lost? It's the first step of a convoulution layer (stride 2): Īnd if filters = 32, layer repeats it 32 times? Am I correct? My main question is: What hyper-parameters should I use for Conv1D layer? model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))ĭoes it mean that filters=32 will only scan first 32 words completely discarding the rest (with kernel_size=2)? And I should set filters to 951 (max amount of words in the sentence)? pile(loss='categorical_crossentropy', optimizer='adam', metrics=) Model.add(Dense(labels_count, activation='softmax')) Model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu')) Model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length)) It's a very simple model (I have made more complicated structures but, strangely it works better - even without using LSTM): model = Sequential()

batch_size: 37 (it doesn't matter for this question).embedding_vecor_length: 32 (how many relations each word has in word embeddings).Amount of sentences (for training): 9800.Maximum words in the sentence: 951 (if it's less - the paddings are added).I have the following sentences (input data):

It works (it works fine and I got up to 98.7 validation accuracy) but I can't wrap my head around about how exactly 1D-convolution layer works with text data. I am currently developing a text classification tool using Keras.
