‍Implementing Seq2Seq Models for Text Summarization With Keras

Nov 04, 2024 10:07 PM - 1 month ago 48625

In this tutorial we’ll screen the 2nd portion of this bid connected encoder-decoder sequence-to-sequence RNNs: really to build, train, and trial our seq2seq exemplary for matter summarization utilizing Keras.

Prerequisites

In bid to travel on pinch this article, you will request acquisition pinch Python code, and a beginners knowing of Deep Learning. We will run nether the presumption that each readers person entree to sufficiently powerful machines, truthful they tin tally the codification provided.

If you do not person entree to a GPU, we propose accessing it done the cloud.

For instructions connected getting started pinch Python code, we urge trying this beginners guide to group up your strategy and preparing to tally beginner tutorials.

Step 7: Creating the Model

First, import each the basal libraries.

from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, \ Concatenate, TimeDistributed from tensorflow.keras.models import Model from tensorflow.keras.callbacks import EarlyStopping

Next, specify the Encoder and Decoder networks.

Encoder

The input magnitude that the encoder accepts is adjacent to the maximum matter magnitude which you’ve already estimated successful Step 3. This is past fixed to an Embedding Layer of magnitude (total number of words captured successful the matter vocabulary) x (number of nodes successful an embedding layer)  (calculated successful Step 5; the x_voc variable). This is followed by 3 LSTM networks wherein each furniture returns the LSTM output, arsenic good arsenic the hidden and compartment states observed astatine the erstwhile clip steps.

Decoder

In the decoder, an embedding furniture is defined followed by an LSTM network. The first authorities of the LSTM web is the past hidden and compartment states taken from the encoder. The output of the LSTM is fixed to a Dense furniture wrapped successful a TimeDistributed furniture pinch an attached softmax activation function.

Altogether, the exemplary accepts encoder (text) and decoder (summary) arsenic input and it outputs the summary. The prediction happens done predicting the upcoming connection of the summary from the erstwhile connection of the summary (see the beneath figure).

image

Consider the summary statement to beryllium “I want each property to laugh”. The exemplary has to judge 2 inputs - the existent matter and the summary. During the training phase, the decoder accepts the input summary fixed to the model, and learns each connection that has to travel a definite fixed word. It past generates the predictions utilizing an conclusion exemplary during the trial phase.

Add the pursuing codification to specify your web architecture.

latent_dim = 300 embedding_dim = 200 encoder_inputs = Input(shape=(max_text_len, )) enc_emb = Embedding(x_voc, embedding_dim, trainable=True)(encoder_inputs) encoder_lstm1 = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.4, recurrent_dropout=0.4) (encoder_output1, state_h1, state_c1) = encoder_lstm1(enc_emb) encoder_lstm2 = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.4, recurrent_dropout=0.4) (encoder_output2, state_h2, state_c2) = encoder_lstm2(encoder_output1) encoder_lstm3 = LSTM(latent_dim, return_state=True, return_sequences=True, dropout=0.4, recurrent_dropout=0.4) (encoder_outputs, state_h, state_c) = encoder_lstm3(encoder_output2) decoder_inputs = Input(shape=(None, )) dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True) dec_emb = dec_emb_layer(decoder_inputs) decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.4, recurrent_dropout=0.2) (decoder_outputs, decoder_fwd_state, decoder_back_state) = \ decoder_lstm(dec_emb, initial_state=[state_h, state_c]) decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax')) decoder_outputs = decoder_dense(decoder_outputs) model = Model([encoder_inputs, decoder_inputs], decoder_outputs) model.summary() Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param ================================================================================================== input_1 (InputLayer) [(None, 100)] 0 __________________________________________________________________________________________________ embedding (Embedding) (None, 100, 200) 5927600 input_1[0][0] __________________________________________________________________________________________________ lstm (LSTM) [(None, 100, 300), ( 601200 embedding[0][0] __________________________________________________________________________________________________ input_2 (InputLayer) [(None, None)] 0 __________________________________________________________________________________________________ lstm_1 (LSTM) [(None, 100, 300), ( 721200 lstm[0][0] __________________________________________________________________________________________________ embedding_1 (Embedding) (None, None, 200) 2576600 input_2[0][0] __________________________________________________________________________________________________ lstm_2 (LSTM) [(None, 100, 300), ( 721200 lstm_1[0][0] __________________________________________________________________________________________________ lstm_3 (LSTM) [(None, None, 300), 601200 embedding_1[0][0] lstm_2[0][1] lstm_2[0][2] __________________________________________________________________________________________________ time_distributed (TimeDistribut (None, None, 12883) 3877783 lstm_3[0][0] ================================================================================================== Total params: 15,026,783 Trainable params: 15,026,783 Non-trainable params: 0 __________________________________________________________________________________________________

Step 8: Training the Model

In this step, compile the exemplary and specify EarlyStopping to extremity training the exemplary erstwhile the validation nonaccomplishment metric has stopped decreasing.

model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy') es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)

Next, usage the model.fit() method to fresh the training information wherever you tin specify the batch size to beryllium 128. Send the matter and summary (excluding the past connection successful summary) arsenic the input, and a reshaped summary tensor comprising each connection (starting from the 2nd word) arsenic the output (which explains the infusion of intelligence into the exemplary to foretell a word, fixed the erstwhile word). Besides, to alteration validation during the training phase, nonstop the validation information arsenic well.

history = model.fit( [x_tr, y_tr[:, :-1]], y_tr.reshape(y_tr.shape[0], y_tr.shape[1], 1)[:, 1:], epochs=50, callbacks=[es], batch_size=128, validation_data=([x_val, y_val[:, :-1]], y_val.reshape(y_val.shape[0], y_val.shape[1], 1)[: , 1:]), ) Train connected 88513 samples, validate connected 9835 samples Epoch 1/50 88513/88513 [==============================] - 426s 5ms/sample - loss: 5.1520 - val_loss: 4.8026 Epoch 2/50 88513/88513 [==============================] - 412s 5ms/sample - loss: 4.7110 - val_loss: 4.5082 Epoch 3/50 88513/88513 [==============================] - 412s 5ms/sample - loss: 4.4448 - val_loss: 4.2815 Epoch 4/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 4.2487 - val_loss: 4.1264 Epoch 5/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 4.1049 - val_loss: 4.0170 Epoch 6/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.9968 - val_loss: 3.9353 Epoch 7/50 88513/88513 [==============================] - 412s 5ms/sample - loss: 3.9086 - val_loss: 3.8695 Epoch 8/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.8321 - val_loss: 3.8059 Epoch 9/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.7598 - val_loss: 3.7517 Epoch 10/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.6948 - val_loss: 3.7054 Epoch 11/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.6408 - val_loss: 3.6701 Epoch 12/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.5909 - val_loss: 3.6376 Epoch 13/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.5451 - val_loss: 3.6075 Epoch 14/50 88513/88513 [==============================] - 412s 5ms/sample - loss: 3.5065 - val_loss: 3.5879 Epoch 15/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.4690 - val_loss: 3.5552 Epoch 16/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.4322 - val_loss: 3.5308 Epoch 17/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.3981 - val_loss: 3.5123 Epoch 18/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.3683 - val_loss: 3.4956 Epoch 19/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.3379 - val_loss: 3.4787 Epoch 20/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.3061 - val_loss: 3.4594 Epoch 21/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.2803 - val_loss: 3.4412 Epoch 22/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.2552 - val_loss: 3.4284 Epoch 23/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.2337 - val_loss: 3.4168 Epoch 24/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.2123 - val_loss: 3.4148 Epoch 25/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.1924 - val_loss: 3.3974 Epoch 26/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.1727 - val_loss: 3.3869 Epoch 27/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.1546 - val_loss: 3.3853 Epoch 28/50 88513/88513 [==============================] - 408s 5ms/sample - loss: 3.1349 - val_loss: 3.3778 Epoch 29/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.1188 - val_loss: 3.3637 Epoch 30/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.1000 - val_loss: 3.3544 Epoch 31/50 88513/88513 [==============================] - 413s 5ms/sample - loss: 3.0844 - val_loss: 3.3481 Epoch 32/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 3.0680 - val_loss: 3.3407 Epoch 33/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.0531 - val_loss: 3.3374 Epoch 34/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 3.0377 - val_loss: 3.3314 Epoch 35/50 88513/88513 [==============================] - 408s 5ms/sample - loss: 3.0214 - val_loss: 3.3186 Epoch 36/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 3.0041 - val_loss: 3.3128 Epoch 37/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 2.9900 - val_loss: 3.3195 Epoch 38/50 88513/88513 [==============================] - 407s 5ms/sample - loss: 2.9784 - val_loss: 3.3007 Epoch 39/50 88513/88513 [==============================] - 408s 5ms/sample - loss: 2.9655 - val_loss: 3.2975 Epoch 40/50 88513/88513 [==============================] - 410s 5ms/sample - loss: 2.9547 - val_loss: 3.2889 Epoch 41/50 88513/88513 [==============================] - 408s 5ms/sample - loss: 2.9424 - val_loss: 3.2923 Epoch 42/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 2.9331 - val_loss: 3.2753 Epoch 43/50 88513/88513 [==============================] - 411s 5ms/sample - loss: 2.9196 - val_loss: 3.2847 Epoch 44/50 88513/88513 [==============================] - 409s 5ms/sample - loss: 2.9111 - val_loss: 3.2718 Epoch 45/50 50688/88513 [================>.............] - ETA: 2:48 - loss: 2.8809

Next, crippled the training and validation nonaccomplishment metrics observed during the training phase.

from matplotlib import pyplot pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend() pyplot.show()

image

Train and Validation Loss (Loss v/s Epoch)

Step 9: Generating Predictions

Now that we’ve trained the model, to make summaries from the fixed pieces of text, first reverse representation the indices to the words (which has been antecedently generated utilizing texts_to_sequences successful Step 5). Also, representation the words to indices from the summaries tokenizer which is to beryllium utilized to observe the commencement and extremity of the sequences.

reverse_target_word_index = y_tokenizer.index_word reverse_source_word_index = x_tokenizer.index_word target_word_index = y_tokenizer.word_index

Now specify the encoder and decoder conclusion models to commencement making the predictions. Use tensorflow.keras.Model() entity to create your conclusion models.

An encoder conclusion exemplary accepts matter and returns the output generated from the 3 LSTMs, and hidden and compartment states. A decoder conclusion exemplary accepts the commencement of the series identifier (sostok) and predicts the upcoming word, yet starring to predicting the full summary.

Add the pursuing codification to specify the conclusion models’ architecture.

encoder_model = Model(inputs=encoder_inputs, outputs=[encoder_outputs, state_h, state_c]) decoder_state_input_h = Input(shape=(latent_dim, )) decoder_state_input_c = Input(shape=(latent_dim, )) decoder_hidden_state_input = Input(shape=(max_text_len, latent_dim)) dec_emb2 = dec_emb_layer(decoder_inputs) (decoder_outputs2, state_h2, state_c2) = decoder_lstm(dec_emb2, initial_state=[decoder_state_input_h, decoder_state_input_c]) decoder_outputs2 = decoder_dense(decoder_outputs2) decoder_model = Model([decoder_inputs] + [decoder_hidden_state_input, decoder_state_input_h, decoder_state_input_c], [decoder_outputs2] + [state_h2, state_c2])

Now specify a usability decode_sequence() which accepts the input matter and outputs the predicted summary. Start pinch sostok and proceed generating words until eostok is encountered aliases the maximum magnitude of the summary is reached. Predict the upcoming connection from a fixed connection by choosing the connection which has the maximum probability attached and update the soul authorities of the decoder accordingly.

def decode_sequence(input_seq): (e_out, e_h, e_c) = encoder_model.predict(input_seq) target_seq = np.zeros((1, 1)) target_seq[0, 0] = target_word_index['sostok'] stop_condition = False decoded_sentence = '' while not stop_condition: (output_tokens, h, c) = decoder_model.predict([target_seq] + [e_out, e_h, e_c]) sampled_token_index = np.argmax(output_tokens[0, -1, :]) sampled_token = reverse_target_word_index[sampled_token_index] if sampled_token != 'eostok': decoded_sentence += ' ' + sampled_token if sampled_token == 'eostok' or len(decoded_sentence.split()) \ >= max_summary_len - 1: stop_condition = True target_seq = np.zeros((1, 1)) target_seq[0, 0] = sampled_token_index (e_h, e_c) = (h, c) return decoded_sentence

Define 2 functions - seq2summary() and seq2text() which person numeric-representation to string-representation of summary and matter respectively.

def seq2summary(input_seq): newString = '' for one in input_seq: if one != 0 and one != target_word_index['sostok'] and one \ != target_word_index['eostok']: newString = newString + reverse_target_word_index[i] + ' ' return newString def seq2text(input_seq): newString = '' for one in input_seq: if one != 0: newString = newString + reverse_source_word_index[i] + ' ' return newString

Finally, make the predictions by sending successful the text.

for one in range(0, 19): print ('Review:', seq2text(x_tr[i])) print ('Original summary:', seq2summary(y_tr[i])) print ('Predicted summary:', decode_sequence(x_tr[i].reshape(1, max_text_len))) print '\n'

Here are a fewer notable summaries generated by the RNN model.

Review: america president donald trump connected wednesday said that northbound korea has returned the remains of 200 america troops missing from the korean warfare though location was nary charismatic confirmation from subject authorities northbound korean leader kim jong un had agreed to return the remains during his acme pinch trump astir 700 america troops stay unaccounted from the 1950 1953 korean war Original summary: commencement n korea has returned remains of 200 america warfare dormant trump end Predicted summary: commencement n korea has mislaid an warfare against america trump end Review: pope francis has said that history will judge those who garbage to judge the subject of ambiance alteration if personification is doubtful that ambiance alteration is true they should inquire scientists the pope added notably america president donald trump who believes world warming is island conspiracy withdrew the state from the paris ambiance agreement Original summary: commencement history will judge those denying ambiance alteration pope end Predicted summary: commencement pope francis will beryllium in paris ambiance woody prez end Review: the enforcement directorate ed has attached assets worthy complete ã¢â‚â¹33 500 crore in the complete 3 twelvemonth tenure of its main karnal singh who retires weekend officials said the agency revenge astir 390 in relationship pinch its money laundering probes during the play the authorities connected saturday appointed amerind gross service irs serviceman sanjay kumar mishra arsenic interim ed chief Original summary: commencement enforcement attached assets worthy ã¢â‚â¹33 500 cr in yrs end Predicted summary: commencement ed attaches assets worthy 100 crore in india in days end Review: lok janshakti statement president ram vilas paswan girl asha has said she will title elections against him from constituency if fixed summons from lalu prasad yadav rjd she accused him of neglecting her and promoting his boy chirag asha is paswan girl from his first woman while chirag is his boy from his 2nd wife Original summary: commencement will title against begetter ram vilas from girl end Predicted summary: commencement lalu boy tej pratap to title his girl in 2019 end Review: island lawman premier curate frances fitzgerald announced her resignation connected tuesday in bid to debar the illness of the authorities and imaginable threat predetermination she discontinue hours earlier nary assurance mobility was to beryllium projected against her by the main guidance statement the governmental situation began complete fitzgerald domiciled in constabulary whistleblower scandal Original summary: commencement island lawman premier curate resigns to debar govt illness end Predicted summary: commencement pmo resigns from punjab to join nda end Review: rr wicketkeeper batsman jos buttler slammed his 5th consecutive 50 in ipl 2018 connected weekend to adjacent erstwhile amerind cricketer virender sehwag grounds of most consecutive 50 scores in the ipl sehwag had achieved the feat while representing dd in the ipl 2012 buttler is besides only the 2nd batsman aft shane watson to deed 2 successive 90 scores in ipl Original summary: commencement buttler equals sehwag grounds of most consecutive 50s in ipl end Predicted summary: commencement sehwag slams sixes in an ipl complete 100 times in ipl end Review: maruti suzuki india connected wednesday said it is recalling 640 units of its ace transportation mini trucks sold in the home marketplace complete imaginable defect in substance pump proviso the callback covers ace transportation units manufactured betwixt january 20 and july 14 2018 the faulty parts in the affected vehicles will beryllium replaced free of costs the automaker said n Original summary: commencement maruti recalls its mini trucks complete substance pump rumor in india end Predicted summary: commencement maruti suzuki recalls india complete ã¢â‚â¹3 crore end Review: the arrested lashkar e taiba let violent aamir ben has confessed to the nationalist investigation agency that pakistani service provided him screen firing to infiltrate into india he further revealed that hafiz organisation ud dawah arranged for his training and that he was sent crossed india to transportation retired subversive activities in and extracurricular kashmir Original summary: commencement pak helped maine participate india arrested let violent to nia end Predicted summary: commencement pak man who killed amerind soldiers to participate kashmir end Review: the 23 richest indians in the 500 personnel bloomberg billionaires scale saw wealthiness erosion of 21 cardinal this twelvemonth lakshmi mittal who controls the world largest steelmaker arcelormittal mislaid 5 6 cardinal aliases 29 of his nett worthy followed by sun pharma laminitis dilip shanghvi whose wealthiness declined 4 6 cardinal asia richest personification mukesh ambani added 4 cardinal to his fortune Original summary: commencement lakshmi mittal mislaid 10 bn in 2018 ambani added 4 bn end Predicted summary: commencement india richest man mislaid cardinal in wealthiness in 2017 end

Conclusion

The Encoder-Decoder Sequence-to-Sequence Model (LSTM) we built generated acceptable summaries from what it learned successful the training texts. Although aft 50 epochs the predicted summaries are not precisely connected par pinch the expected summaries (our exemplary hasn’t yet reached human-level intelligence!), the intelligence our exemplary has gained decidedly counts for something.

To attain much meticulous results from this model, you tin summation the size of the dataset, play astir pinch the hyperparameters of the network, effort making it larger, and summation the number of epochs.

In this tutorial, you’ve trained an encoder-decoder sequence-to-sequence exemplary to execute matter summarization. In my adjacent article you tin study each astir attraction mechanisms. Until then, happy learning!

Reference: Sandeep Bhogaraju

More