Tuesday, July 4, 2017

How to fetch WWDC 2017 Video Subtitle to SRT format

Create and Run this script wwdc2017_fetch_srt.sh to fetch WWDC2017 subtitle
Reference : https://github.com/wsvn53/wwdc2016-subtitles

wwdc2017_fetch_srt.sh    Select all
#!/bin/sh # @Author: Ethan # @Date: 2016-06-22 14:10:53 # @Last Modified by: javacom # @Last Modified time: 2017-07-04 WWDC_YEAR=2017; WWDC_SESSION_PREFIX=https://developer.apple.com/videos/play/wwdc$WWDC_YEAR; WWDC_LOCAL_DIR=$(basename $WWDC_SESSION_PREFIX); detect_video_m3u8 () { local session_url=$WWDC_SESSION_PREFIX/$SESSION_ID/; local session_html=$(curl -s $session_url); local video_url=$(echo "$session_html" | grep .m3u8 | grep $SESSION_ID | head -n1 | sed "s#.*\"\(https://.*m3u8\)\".*#\1#"); echo "$session_html" | grep .mp4 | grep $SESSION_ID | sed "s#.*\"\(https://.*mp4\).*\".*#\1#" | while read mp4_url; do local mp4_filename=$(basename $mp4_url | cut -d. -f1); local srt_filename=$mp4_filename.srt; echo "> Subtitle local: $WWDC_LOCAL_DIR/$srt_filename" >&2; > $WWDC_LOCAL_DIR/$srt_filename; done echo "$video_url"; echo "> Video: $video_url" >&2; } detect_subtitle_m3u8 () { local video_url=$1; local subtitle_uri=$(curl -s $video_url | grep "LANGUAGE=\"eng\"" | sed "s#.*URI=\"\(.*\)\"#\1#"); local subtitle_url=$subtitle_uri; [[ "$subtitle_uri" != http* ]] && { subtitle_url=$(dirname $video_url)/$subtitle_uri; } echo "$subtitle_url"; echo "> Subtitle: $subtitle_url" >&2; } download_subtitle_contents () { local subtitle_url=$1; echo "> Downloading... " local subtitle_base_url=$(dirname $subtitle_url); curl -s $subtitle_url | grep "webvtt" | while read webvtt; do local subtitle_webvtt=$subtitle_base_url/$webvtt; #echo "- get $subtitle_webvtt"; local subtitle_content=$(curl -s $subtitle_webvtt); ls $WWDC_LOCAL_DIR/"$SESSION_ID"_* | while read srt_file; do echo "$subtitle_content" >> $srt_file; done done } main () { [ ! -d $WWDC_LOCAL_DIR ] && { mkdir $WWDC_LOCAL_DIR; } curl -s $WWDC_SESSION_PREFIX | grep /videos/play/wwdc$WWDC_YEAR | sed "s#.*/videos/play/wwdc$WWDC_YEAR/\([0-9]\{3\}\).*#\1#" | sort | uniq | while read SESSION_ID; do #echo "SESSION_ID is" $SESSION_ID local video_url=$(detect_video_m3u8 $SESSION_ID); local subtitle_url=$(detect_subtitle_m3u8 $video_url); download_subtitle_contents $subtitle_url; done } main;




Run this shell script to format as SRT subtitle

shellscript.sh    Select all
cd wwdc2017 mkdir -p sd mkdir -p hd for i in ???_sd_*.srt; do sed -e '/WEBVTT/d;/X-TIMESTAMP/d;' $i | awk '/^[0-9]{2}:[0-9]{2}:/ {seen[$0]++; skipduplicated=0} {if (seen[$0]>1) skipduplicated=1; if (!skipduplicated) print $0}' | awk -v RS="" '{gsub("\n", "-Z"); print}' | awk '$0 !~/^WEB/ {print $0}' | uniq | awk '{printf "\n%s-Z%s", NR,$0 }' | awk -v ORS="\n\n" '{gsub("-Z", "\n"); print}' | sed -e 's/.A:middle$//g;s/&gt;/>/g;s/&lt;/</g;1,2d;' > sd/$i; done for i in ???_hd_*.srt; do sed -e '/WEBVTT/d;/X-TIMESTAMP/d;' $i | awk '/^[0-9]{2}:[0-9]{2}:/ {seen[$0]++; skipduplicated=0} {if (seen[$0]>1) skipduplicated=1; if (!skipduplicated) print $0}' | awk -v RS="" '{gsub("\n", "-Z"); print}' | awk '$0 !~/^WEB/ {print $0}' | uniq | awk '{printf "\n%s-Z%s", NR,$0 }' | awk -v ORS="\n\n" '{gsub("-Z", "\n"); print}' | sed -e 's/.A:middle$//g;s/&gt;/>/g;s/&lt;/</g;1,2d;' > hd/$i; done




Run this script wwdc2017_fetch_mp4.sh to download all mp4 videos

wwdc2017_fetch_mp4.sh    Select all
#!/bin/sh # @Last Modified by: javacom # @Last Modified time: 2016-07-08 WWDC_YEAR=2017; WWDC_SESSION_PREFIX=https://developer.apple.com/videos/play/wwdc$WWDC_YEAR; WWDC_LOCAL_DIR=$(basename $WWDC_SESSION_PREFIX); download_mp4_video () { local session_url=$WWDC_SESSION_PREFIX/$SESSION_ID/; local session_html=$(curl -s $session_url); local video_url=$(echo "$session_html" | grep .m3u8 | grep $SESSION_ID | head -n1 | sed "s#.*\"\(https://.*m3u8\)\".*#\1#"); echo "$session_html" | grep .mp4 | grep $SESSION_ID | sed "s#.*\"\(https://.*mp4\).*\".*#\1#" | while read mp4_url; do local mp4_filename=$(basename $mp4_url); if [ -e $WWDC_LOCAL_DIR/$mp4_filename ] then echo "> MP4 already existed : $WWDC_LOCAL_DIR/$mp4_filename" >&2; else echo "> MP4 Downloading... : $mp4_url" >&2; curl -o $WWDC_LOCAL_DIR/$mp4_filename $mp4_url fi done } main () { [ ! -d $WWDC_LOCAL_DIR ] && { mkdir $WWDC_LOCAL_DIR; } curl -s $WWDC_SESSION_PREFIX | grep /videos/play/wwdc$WWDC_YEAR | sed "s#.*/videos/play/wwdc$WWDC_YEAR/\([0-9]\{3\}\).*#\1#" | sort | uniq | while read SESSION_ID; do download_mp4_video $SESSION_ID; done } main;


One line version wwdc2017_fetch_mp4.sh to download all mp4 videos

wwdc2017_fetch_mp4.sh    Select all
# one liner for hd videos download WWDCYEAR="wwdc2017"; for i in `curl -s https://developer.apple.com/videos/$WWDCYEAR/ | grep -o '<a href="/videos/play/'"$WWDCYEAR"'/[0-9]*' | cut -d '"' -f2 | sort | uniq`; do video_url=$(curl -s https://developer.apple.com${i} | grep -o 'http.*_hd_.*.mp4'); if [ ! -z "$video_url" ]; then mp4_filename=$(basename $video_url); if [ -e $mp4_filename ]; then echo "skipping $mp4_filename"; else echo "Downloading ... $mp4_filename";curl -O $video_url; fi; fi; done # one liner for sd videos download WWDCYEAR="wwdc2017"; for i in `curl -s https://developer.apple.com/videos/$WWDCYEAR/ | grep -o '<a href="/videos/play/'"$WWDCYEAR"'/[0-9]*' | cut -d '"' -f2 | sort | uniq`; do video_url=$(curl -s https://developer.apple.com${i} | grep -o 'http.*_sd_.*.mp4'); if [ ! -z "$video_url" ]; then mp4_filename=$(basename $video_url); if [ -e $mp4_filename ]; then echo "skipping $mp4_filename"; else echo "Downloading ... $mp4_filename";curl -O $video_url; fi; fi; done




Wednesday, June 7, 2017

How to train dataset in python and convert to CoreML model for iOS11

Reference http://machinelearningmastery.com/machine-learning-in-python-step-by-step/

Environment : macOS 10.12.4
matplotlib==2.0.0
numpy==1.12.1
pandas==0.19.2
scikit-learn==0.18.1
scipy==0.19.0
six==1.10.0
sklearn==0.18.1
coremltools==0.3.0
protobuf==3.3.0

Upgrade the pip and install the following python packages
shellscript.sh    Select all
pip install --upgrade pip sudo -H pip install numpy scipy matplotlib pandas sklearn coremltools protobuf



Convert to Core ML Run the following python code to show machine learning in python step by step and finally generate iris_lr.mlmodel
iris_learn.py    Select all
#!/usr/bin/env python # Check the versions of libraries # Python version import sys print('Python: {}'.format(sys.version)) # scipy import scipy print('scipy: {}'.format(scipy.__version__)) # numpy import numpy print('numpy: {}'.format(numpy.__version__)) # matplotlib import matplotlib print('matplotlib: {}'.format(matplotlib.__version__)) # pandas import pandas print('pandas: {}'.format(pandas.__version__)) # scikit-learn import sklearn print('sklearn: {}'.format(sklearn.__version__)) # Load libraries import pandas from pandas.tools.plotting import scatter_matrix import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC # Load dataset url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pandas.read_csv(url, names=names) # shape print(dataset.shape) # head print(dataset.head(20)) # descriptions print(dataset.describe()) # class distribution print(dataset.groupby('class').size()) # box and whisker plots dataset.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False) plt.suptitle("Box and Whisker Plots for inputs") plt.show() # histograms dataset.hist() plt.suptitle('Histograms for inputs') plt.show() # scatter plot matrix scatter_matrix(dataset) plt.suptitle('Scatter Plot Matrix for inputs') plt.show() # Split-out validation dataset array = dataset.values X = array[:,0:4] Y = array[:,4] validation_size = 0.20 seed = 7 X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed) # Test options and evaluation metric seed = 7 scoring = 'accuracy' # Spot Check Algorithms models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('NB', GaussianNB())) models.append(('SVM', SVC())) # evaluate each model in turn results = [] names = [] for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg) # Compare Algorithms fig = plt.figure() fig.suptitle('Algorithm Comparison') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) plt.show() # Make predictions on validation dataset knn = KNeighborsClassifier() knn.fit(X_train, Y_train) predictions = knn.predict(X_validation) print(accuracy_score(Y_validation, predictions)) print(confusion_matrix(Y_validation, predictions)) print(classification_report(Y_validation, predictions)) print("Make predictions on LogisticRegression Model") model = LogisticRegression() model.fit(X_train, Y_train) predictions = model.predict(X_validation) print(accuracy_score(Y_validation, predictions)) print(confusion_matrix(Y_validation, predictions)) print(classification_report(Y_validation, predictions)) # print prediction results on test data for i, prediction in enumerate(predictions): print 'Predicted: %s, Target: %s %s' % (prediction, Y_validation[i], '' if prediction==Y_validation[i] else '(WRONG!!!)') #convert and save scikit.learn model #support LogisticRegression of scikit.learn print("Convert LogisticRegression Model to coreml model") import coremltools coreml_model = coremltools.converters.sklearn.convert(model, ["sepal-length", "sepal-width", "petal-length", "petal-width"], "class") #set model metadata coreml_model.author = 'Author' coreml_model.license = 'BSD' coreml_model.short_description = 'LogisticRegression on Iris flower data set' #set features description manually coreml_model.input_description['sepal-length'] = 'Sepal Length in centimetres' coreml_model.input_description['sepal-width'] = 'Sepal Width in centimetres' coreml_model.input_description['petal-length'] = 'Petal Length in centimetres' coreml_model.input_description['petal-width'] = 'Petal Width in centimetres' #set the ouput description coreml_model.output_description['class'] = 'Distinguish the species' #save the model coreml_model.save('iris_lr.mlmodel') from coremltools.models import MLModel model = MLModel('iris_lr.mlmodel') #get the spec of the model print(model.get_spec())


Download Xcode 9 beta and the sample code from Apple

https://docs-assets.developer.apple.com/published/51ff0c1668/IntegratingaCoreMLModelintoYourApp.zip
Modify it and add the model to the xcode project


Try the new refactoring tool in Xcode 9. It is amazing.


Train data using Neural Network Model Keras
Reference : http://machinelearningmastery.com/5-step-life-cycle-neural-network-models-keras/

shellscript.sh    Select all
# download training data curl -O http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data # install and activate virtual environment and install necessary python packages # use deactivate to stop the python virtual env sudo -H pip install --upgrade virtualenv virtualenv --system-site-packages ~/tensorflow source ~/tensorflow/bin/activate # macOS, CPU only non-optimised, Python 2.7: # https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0-py2-none-any.whl # macOS, GPU enabled, Python 2.7: # https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-1.1.0-py2-none-any.whl # or find optimised wheel files from the community https://github.com/yaroslavvb/tensorflow-community-wheels/issues # this optimised one (SSE4.1,SSE4.2,AVX,AVX2,FMA) works for Python 2.7 macOS 10.12 Tensoflow 1.1.0 CPU https://github.com/fdalvi/tensorflow-builds # instruction to build your own python package https://ctmakro.github.io/site/on_learning/tf1c.html # suppose, install the official non-optimised wheel file as below pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0-py2-none-any.whl pip install coremltools protobuf pip install keras==1.2.2 h5py

Convert to Core ML Run the following python code in virtual environment (tensorflow) to generate pima_keras.mlmodel
keras_learn.py    Select all
#!/usr/bin/env python from keras.models import Sequential from keras.layers import Dense import numpy # fix random seed for reproducibility numpy.random.seed(7) # load pima indians dataset #dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") dataset = numpy.loadtxt("pima-indians-diabetes.data", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:8] Y = dataset[:,8] # create model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model #model.fit(X, Y, epochs=150, batch_size=10) model.fit(X, Y, 10, 150) # parameters change to keras 1.2.2 # evaluate the model scores = model.evaluate(X, Y) print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) #convert and save keras model model.save('pima.h5') print("Convert Model to coreml model") import coremltools coreml_model = coremltools.converters.keras.convert('pima.h5') #set model metadata coreml_model.author = 'Author' coreml_model.license = 'BSD' coreml_model.short_description = 'pima-indians-diabetes' #save the model coreml_model.save('pima_keras.mlmodel') from coremltools.models import MLModel mlmodel = MLModel('pima_keras.mlmodel') #get the spec of the model print(mlmodel.get_spec())


Note: coremltools require python 2.7 (not for 3.x) and supports keras==1.2.2 with Tenorflow (1.0.x, 1.1.x) only. Tenorflow_gpu requires Nvidia Cuda 8.0 and cuDNN v5.1 (which also requires macOS 10.11/10.12) but recent models of Mac are all bundled AMD GPUs. Unless you could get an old Mac Pro with upgraded Nvidia GPU with at least 4 GB of video RAM, it is better to stay with Mac CPU i7 or get a Linux machine for data training purpose only.

For Windows PC, tensorflow/tensorflow_gpu is only available for Python 3.5 and 64 bits only as below. As current coremltools keras convertors are not compatible with python 3.5, so direct conversion is not available in PC yet.
https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl
https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl



keras-inception-test Run the following python code in virtual environment (tensorflow) to test Keras Inceptionv3 model. This will download the trained Inception V3 weights from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
shellscript.sh    Select all
git clone git://github.com/vml-ffleschner/coremltools-keras-inception-test cd coremltools-keras-inception-test/ # based on the created virtualenv in ~/tensorflow as above source ~/tensorflow/bin/activate # additional installation of packages pip install olefile pillow #Add coreml_model.author = 'Author' coreml_model.license = 'BSD' coreml_model.short_description = 'Image InceptionV3 model' coreml_model.save('Inceptionv3.mlmodel') print("CoreML model file Created") #After #print("CoreML Converted") #in playground.py # note : coreml_model.predict requires macOS 10.13 High Sierra python playground.py


Install tensorflow 1.1.0 library for Java is here
shellscript.sh    Select all
curl -O https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.1.0.jar curl -O https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-darwin-x86_64-1.1.0.tar.gz
# install tar xzvf libtensorflow_jni-cpu-darwin-x86_64-1.1.0.tar.gz -C ./jni # compile and run HelloTF javac -cp libtensorflow-1.1.0.jar HelloTF.java java -cp libtensorflow-1.1.0.jar:. -Djava.library.path=./jni HelloTF