Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts

Sunday, June 17, 2018

How to install turicreate on macOS 10.14 beta

Install turicreate on macOS 10.14 beta 1
shell script    Select all
# upgrade pip # curl https://bootstrap.pypa.io/get-pip.py | sudo python curl https://bootstrap.pypa.io/get-pip.py | python # install packages sudo pip install requests==2.18.4 turicreate==5.0b1


(1) Test turicreate example - Image Classifier
shell script    Select all
mkdir -p $HOME/MLClassifier cd $HOME/MLClassifier # download dataset and cleanup curl -L -o dataset.zip https://drive.google.com/uc?id=1ZLigrn7YcETalcj2qK6UqXceDdOV3244&export=download unzip dataset.zip rm -fr __MACOSX; rm dataset/.DS_Store dataset/*/.DS_Store # create python script cat > classifier.py << 'EOF' import turicreate as turi # load images from dataset folder url = "dataset/" data = turi.image_analysis.load_images(url) # define image categories data["foodType"] = data["path"].apply(lambda path: "Rice" if "rice" in path else "Soup") # create sframe data.save("rice_or_soup.sframe") # preview dataset data.explore() # load sframe dataBuffer = turi.SFrame("rice_or_soup.sframe") # create training data using 90% of dataset trainingBuffers, testingBuffers = dataBuffer.random_split(0.9) # create model model = turi.image_classifier.create(trainingBuffers, target="foodType", model="squeezenet_v1.1", max_iterations=100) # Alternate model use ResNet-50 # model = turi.image_classifier.create(trainingBuffers, target="foodType", model="resnet-50") # evaluate model evaluations = model.evaluate(testingBuffers) print evaluations["accuracy"] # save model model.save("rice_or_soup.model") model.export_coreml("RiceSoupClassifier.mlmodel") EOF #run script python classifier.py


(2) Test turicreate example - Logistic Regression
shell script    Select all
mkdir -p $HOME/LGClassifier cd $HOME/LGClassifier # create python script cat > classifier.py << 'EOF' import turicreate as turi data = turi.SFrame('http://static.turi.com/datasets/regression/yelp-data.csv') data['is_good'] = data['stars'] >= 3 # create sframe data.save("yelp.sframe") # preview dataset #data.show() # load sframe dataBuffer = turi.SFrame("yelp.sframe") # create training data using 80% of dataset train_data, test_data = dataBuffer.random_split(0.8) # create model model=turi.logistic_classifier.create(train_data, target='is_good', features = ['user_avg_stars', 'business_avg_stars', 'user_review_count', 'business_review_count', 'city', 'categories_dict'], max_iterations=200) print model # save predictions predictions = model.classify(test_data) print predictions # evaluate model evaluations = model.evaluate(test_data) print "Accuracy : %s" % evaluations["accuracy"] print "Confusion Matrix : \n%s" % evaluations["confusion_matrix"] EOF #run script python classifier.py


(3) Some data manipulation tips when preparing training data
shell script    Select all
# remove the quotes (replace the number with the quotes with the number without them) in csv file, typically "save as CSV" from excel file. # for example, "222,267.87","455,365.44",... convert to 222267.87,455365.44,... #In shell script cat exceldata.csv | perl -p -e 's/,(?=[\d,.]*\d")//g and s/"(\d[\d,.]*)"/\1/g' > dataset.csv # use map, lambda and zip functions when convert and compute numeric data from 2 data columns #In python script import math data['rate'] = map(lambda (x,y): 0 if x is None or y is None else (0 if math.isnan(x) or math.isnan(y) or math.isinf(y) or x==0 else (999999 if math.isinf(x) or y==0 else 999999 if x/y > 999999 else x/y)) , zip(data['OS'], data['Total Amount'])) # replace training data when values are inf(infinity) or nan(Not A Number) in 'amount' column #In python script import math train_data['amount'] = train_data['amount'].apply(lambda x: 0 if math.isnan(x) else x) train_data['amount'] = train_data['amount'].apply(lambda x: 999 if math.isinf(x) else x) # or use nested if else #In python script import math train_data['amount'] = train_data['amount'].apply(lambda x: 0 if math.isnan(x) else (999 if math.isinf(x) else x )) print train_data['amount'].summary() # remove rows in training data with inf(infinity) or nan(Not A Number) values in 'amount' column #In python script import math train_data = train_data[train_data['amount'].apply(lambda x: 0 if math.isinf(x) or math.isnan(x) else 1)] # SFrame methods but beware, some of the methods are not working https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.html # Other SFrame data manipulation examples https://github.com/apple/turicreate/blob/master/userguide/sframe/data-manipulation.md


(4) Some data examination tips
shell script    Select all
# summary print train_data['amount'].summary() # crosstab import pandas as pd pd.crosstab(data["Rating"], data["is_bad"], margins=True) # custom frequency count for 'amount' column import pandas as pd pd.crosstab(train_data['amount'].apply(lambda x: " 0-10" if x <=10 else ("10-20" if x <=20 else ("20-30" if x <=30 else ("30-40" if x <=30 else ("40-50" if x <=50 else ">50"))))), "Count")


Wednesday, June 7, 2017

How to train dataset in python and convert to CoreML model for iOS11

Reference http://machinelearningmastery.com/machine-learning-in-python-step-by-step/

Environment : macOS 10.12.4
matplotlib==2.0.0
numpy==1.12.1
pandas==0.19.2
scikit-learn==0.18.1
scipy==0.19.0
six==1.10.0
sklearn==0.18.1
coremltools==0.3.0
protobuf==3.3.0

Upgrade the pip and install the following python packages
shellscript.sh    Select all
pip install --upgrade pip sudo -H pip install numpy scipy matplotlib pandas sklearn coremltools protobuf



Convert to Core ML Run the following python code to show machine learning in python step by step and finally generate iris_lr.mlmodel
iris_learn.py    Select all
#!/usr/bin/env python # Check the versions of libraries # Python version import sys print('Python: {}'.format(sys.version)) # scipy import scipy print('scipy: {}'.format(scipy.__version__)) # numpy import numpy print('numpy: {}'.format(numpy.__version__)) # matplotlib import matplotlib print('matplotlib: {}'.format(matplotlib.__version__)) # pandas import pandas print('pandas: {}'.format(pandas.__version__)) # scikit-learn import sklearn print('sklearn: {}'.format(sklearn.__version__)) # Load libraries import pandas from pandas.tools.plotting import scatter_matrix import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC # Load dataset url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pandas.read_csv(url, names=names) # shape print(dataset.shape) # head print(dataset.head(20)) # descriptions print(dataset.describe()) # class distribution print(dataset.groupby('class').size()) # box and whisker plots dataset.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False) plt.suptitle("Box and Whisker Plots for inputs") plt.show() # histograms dataset.hist() plt.suptitle('Histograms for inputs') plt.show() # scatter plot matrix scatter_matrix(dataset) plt.suptitle('Scatter Plot Matrix for inputs') plt.show() # Split-out validation dataset array = dataset.values X = array[:,0:4] Y = array[:,4] validation_size = 0.20 seed = 7 X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed) # Test options and evaluation metric seed = 7 scoring = 'accuracy' # Spot Check Algorithms models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('NB', GaussianNB())) models.append(('SVM', SVC())) # evaluate each model in turn results = [] names = [] for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg) # Compare Algorithms fig = plt.figure() fig.suptitle('Algorithm Comparison') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) plt.show() # Make predictions on validation dataset knn = KNeighborsClassifier() knn.fit(X_train, Y_train) predictions = knn.predict(X_validation) print(accuracy_score(Y_validation, predictions)) print(confusion_matrix(Y_validation, predictions)) print(classification_report(Y_validation, predictions)) print("Make predictions on LogisticRegression Model") model = LogisticRegression() model.fit(X_train, Y_train) predictions = model.predict(X_validation) print(accuracy_score(Y_validation, predictions)) print(confusion_matrix(Y_validation, predictions)) print(classification_report(Y_validation, predictions)) # print prediction results on test data for i, prediction in enumerate(predictions): print 'Predicted: %s, Target: %s %s' % (prediction, Y_validation[i], '' if prediction==Y_validation[i] else '(WRONG!!!)') #convert and save scikit.learn model #support LogisticRegression of scikit.learn print("Convert LogisticRegression Model to coreml model") import coremltools coreml_model = coremltools.converters.sklearn.convert(model, ["sepal-length", "sepal-width", "petal-length", "petal-width"], "class") #set model metadata coreml_model.author = 'Author' coreml_model.license = 'BSD' coreml_model.short_description = 'LogisticRegression on Iris flower data set' #set features description manually coreml_model.input_description['sepal-length'] = 'Sepal Length in centimetres' coreml_model.input_description['sepal-width'] = 'Sepal Width in centimetres' coreml_model.input_description['petal-length'] = 'Petal Length in centimetres' coreml_model.input_description['petal-width'] = 'Petal Width in centimetres' #set the ouput description coreml_model.output_description['class'] = 'Distinguish the species' #save the model coreml_model.save('iris_lr.mlmodel') from coremltools.models import MLModel model = MLModel('iris_lr.mlmodel') #get the spec of the model print(model.get_spec())


Download Xcode 9 beta and the sample code from Apple

https://docs-assets.developer.apple.com/published/51ff0c1668/IntegratingaCoreMLModelintoYourApp.zip
Modify it and add the model to the xcode project


Try the new refactoring tool in Xcode 9. It is amazing.


Train data using Neural Network Model Keras
Reference : http://machinelearningmastery.com/5-step-life-cycle-neural-network-models-keras/

shellscript.sh    Select all
# download training data curl -O http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data # install and activate virtual environment and install necessary python packages # use deactivate to stop the python virtual env sudo -H pip install --upgrade virtualenv virtualenv --system-site-packages ~/tensorflow source ~/tensorflow/bin/activate # macOS, CPU only non-optimised, Python 2.7: # https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0-py2-none-any.whl # macOS, GPU enabled, Python 2.7: # https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-1.1.0-py2-none-any.whl # or find optimised wheel files from the community https://github.com/yaroslavvb/tensorflow-community-wheels/issues # this optimised one (SSE4.1,SSE4.2,AVX,AVX2,FMA) works for Python 2.7 macOS 10.12 Tensoflow 1.1.0 CPU https://github.com/fdalvi/tensorflow-builds # this one works for GeForce GT 650M GPU and CPU (SSE4.2, AVX) and CUDA 8.0, and cuDNN v5.1 https://github.com/bodak/tensorflow-wheels/releases/tag/v1.1.0_27 # instruction to build your own python package https://ctmakro.github.io/site/on_learning/tf1c.html # suppose, install the official non-optimised wheel file as below pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0-py2-none-any.whl pip install coremltools protobuf pip install keras==1.2.2 h5py

Convert to Core ML Run the following python code in virtual environment (tensorflow) to generate pima_keras.mlmodel
keras_learn.py    Select all
#!/usr/bin/env python from keras.models import Sequential from keras.layers import Dense import numpy # fix random seed for reproducibility numpy.random.seed(7) # load pima indians dataset #dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") dataset = numpy.loadtxt("pima-indians-diabetes.data", delimiter=",") # split into input (X) and output (Y) variables X = dataset[:,0:8] Y = dataset[:,8] # create model model = Sequential() model.add(Dense(12, input_dim=8, activation='relu')) model.add(Dense(1, activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model #model.fit(X, Y, epochs=150, batch_size=10) model.fit(X, Y, 10, 150) # parameters change to keras 1.2.2 # evaluate the model scores = model.evaluate(X, Y) print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) #convert and save keras model model.save('pima.h5') print("Convert Model to coreml model") import coremltools coreml_model = coremltools.converters.keras.convert('pima.h5') #set model metadata coreml_model.author = 'Author' coreml_model.license = 'BSD' coreml_model.short_description = 'pima-indians-diabetes' #save the model coreml_model.save('pima_keras.mlmodel') from coremltools.models import MLModel mlmodel = MLModel('pima_keras.mlmodel') #get the spec of the model print(mlmodel.get_spec())


Note: coremltools require python 2.7 (not for 3.x) and supports keras==1.2.2 with Tenorflow (1.0.x, 1.1.x) only. Tenorflow_gpu requires Nvidia Cuda 8.0 and cuDNN v5.1 (which also requires macOS 10.11/10.12) but recent models of Mac are all bundled AMD GPUs. Unless you could get an old Mac Pro with upgraded Nvidia GPU with at least 4 GB of video RAM, it is better to stay with Mac CPU i7 or get a Linux machine for data training purpose only.

Hardware reference for Linux : https://www.oreilly.com/learning/build-a-super-fast-deep-learning-machine-for-under-1000

For Windows PC, tensorflow/tensorflow_gpu is only available for Python 3.5 and 64 bits only as below. As current coremltools keras convertors are not compatible with python 3.5, so direct conversion is not available in PC yet.
https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl
https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl



keras-inception-test Run the following python code in virtual environment (tensorflow) to test Keras Inceptionv3 model. This will download the trained Inception V3 weights from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
shellscript.sh    Select all
git clone git://github.com/vml-ffleschner/coremltools-keras-inception-test cd coremltools-keras-inception-test/ # based on the created virtualenv in ~/tensorflow as above source ~/tensorflow/bin/activate # additional installation of packages pip install olefile pillow #Add coreml_model.author = 'Author' coreml_model.license = 'BSD' coreml_model.short_description = 'Image InceptionV3 model' coreml_model.save('Inceptionv3.mlmodel') print("CoreML model file Created") #After #print("CoreML Converted") #in playground.py # note : coreml_model.predict requires macOS 10.13 High Sierra python playground.py


Install tensorflow 1.1.0 library for Java is here
shellscript.sh    Select all
curl -O https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.1.0.jar curl -O https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-darwin-x86_64-1.1.0.tar.gz
# install tar xzvf libtensorflow_jni-cpu-darwin-x86_64-1.1.0.tar.gz -C ./jni # compile and run HelloTF javac -cp libtensorflow-1.1.0.jar HelloTF.java java -cp libtensorflow-1.1.0.jar:. -Djava.library.path=./jni HelloTF