Probé y trabaja. *Thanks, otra vez Luke!!
En *Wed, Jan 27, 2016 en 4:18 PM, Luke *Lovett <luke.lovett@xxxxxxxxx> escribió:
> yo justo resuelto *HADOOP-253; esto tendría que ser fijado ahora en la rama maestra.
>
>
> En miércoles, enero 27, 2016 en 9:29:14 SOY *UTC-8, Luke *Lovett escribió:
>>
>> pienso que tiene que hacer con el camino de código que el *compressed *BSON toma.
>> La manera el *FileSystem está siendo recuperado está ignorando el esquema en
>> el *URI. El fijar para este asunto es ya en los trabajos. Por el tiempo que 1.5
>> sale esto no más largo ser un problema.
>>
>> En miércoles, enero 27, 2016 en 7:16:35 SOY *UTC-8, Rafael *Aguiar escribió:
>>>
>>> Luke,
>>>
>>> puedo leer un regular *BSON de *S3, es justo cuándo pruebo el *compressed
>>> unos que veo que error.
>>>
>>> Probaré vuestra sugerencia, aun así;
>>>
>>> *Em *terça-*feira, 26 *de *janeiro *de 2016 17:02:41 *UTC-3, Luke *Lovett
>>> *escreveu:
>>>>
>>>> no parece el problema es con el hecho que el *BSON es
>>>> *compressed, pero el hecho que *Hadoop no ha sido configurado para utilizar el *s3
>>>> *filesystem (está esperando *HDFS, aparentemente). Aquello el *connector no coge en el hecho que "*s3*n://" significa *s3 es el *connector problema (yo
>>>> justo archivado *HADOOP-253), pero puedes trabajar alrededor del problema por configurar
>>>> *Hadoop para utilizar *s3 (y único *s3) por poner "*fs.*default.Nombre" y *fs.*defaultFS
>>>> "".
>>>>
>>>> En lunes, enero 25, 2016 en 10:32:47 PM *UTC-8, Rafael *Aguiar escribió:
>>>>>
>>>>> estoy utilizando *pyspark (en *spark 1.3.1) junto con el *mongo-*hadoop *jar,
>>>>> ambos construido de la rama
>>>>> maestra.
>>>>>
>>>>> *rdd = *sc.*newAPIHadoopRDD(
>>>>> *inputFormatClass='*com.*mongodb.*hadoop.*BSONFileInputFormat',
>>>>> *keyClass='*org.*apache.*hadoop.*io.Texto',
>>>>> *valueClass='*org.*apache.*hadoop.*io.*MapWritable',
>>>>> *conf
>>>>> #default{^'*mapred.Entrada.*dir': 3*n://mi-cubo/*compressed_*bson.*gz'
>>>>> }
>>>>> )
>>>>>
>>>>>
>>>>> Cuándo intento crear el RDD por encima de yo consigue el siguiendo error:
>>>>>
>>>>> *INFO *hadoop.*BSONFileInputFormat: Archivo *s3*n://mi-cubo/*compressed_*bson.*gz
>>>>> Es *compressed así que no puede ser partido.
>>>>> *Traceback (La mayoría de llamada reciente dura):
>>>>> Archivo "<*stdin>", línea 6, en <casa>
>>>>> de Archivo "/del módulo/*hadoop/*spark/pitón/*pyspark/contexto.*py", línea 547, en
>>>>> *newAPIHadoopRDD
>>>>> *jconf, *batchSize)
>>>>> casa
>>>>> "/de Archivo/*hadoop/*spark/pitón/*lib/*py4*j-0.8.2.1-*src.*zip/*py4*j/*java_puerta.*py"
>>>>> , línea 538, en __casa__
>>>>> de Archivo
>>>>> "/de la llamada/*hadoop/*spark/pitón/*lib/*py4*j-0.8.2.1-*src.*zip/*py4*j/protocolo.*py",
>>>>> línea 300, dentro conseguir_valor_de regreso
>>>>> *py4*j.Protocolo.*Py4*JJavaError: Un error ocurrido mientras llamando *z:*org.
>>>>> *apache.*spark.*api.Pitón.*PythonRDD.*newAPIHadoopRDD.
>>>>> : *java.*lang.*IllegalArgumentException: FS incorrecto: *s3*n://mi-cubo/*compressed_*bson.*gz,
>>>>> esperó: *hdfs://10.0.2.139:9000
>>>>>
>>>>> Tiene cualquiera afrontó algo similar?
>>>>>
>>>> --
> Recibiste este mensaje porque eres *subscribed al *Google Grupos
> "*mongodb-grupo"
> de usuario.
>
> Para otro *MongoDB opciones de apoyo técnico, ve:
> *http://www.mongodb.org/sobre/apoyo/.
> ---
> Recibiste este mensaje porque eres *subscribed a un tema en
> el *Google Grupos "*mongodb-grupo" de usuario.
> A *unsubscribe de este tema, visita
> *https://grupos.*google.*com/*d/Tema/*mongodb-usuario/2*jcrxOdRuFo/*unsubscribe.
> A *unsubscribe de este grupo y todos sus temas, enviar un *email a *mongodb-usuario+unsubscribe@xxxxxxxxxxxxxxxx.
>
> A correo a este grupo, envía *email a *mongodb-user@xxxxxxxxxxxxxxxx.
> Visita este grupo en *https://grupos.*google.*com/Grupo/*mongodb-usuario.
> Para ver esta discusión en la visita de web
> *https://grupos.*google.*com/*d/*msgid/*mongodb-Usuario/9*b9*e529*c-9*d0*d-4*d07-835*c-1584124*b80*eb%40*googlegroups.*com
> <*https://Grupos.*google.*com/*d/*msgid/*mongodb-Usuario/9*b9*e529*c-9*d0*d-4*d07-835*c-1584124*b80*eb%40*googlegroups.*com?*utm_Medio=*email&*utm_fuente=*footer>
> .
>
> Para más opciones, visita *https://grupos.*google.*com/*d/*optout.
>
--
Rafael *AguiarData Móvil de Ingeniero
de la Ciencia: +55 81 99730.0415 <*callto://+5581997300415>
*Skype: *rafael_*aguiar_
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*callto%3Un%2F%2*Frafael_*aguiar_&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
Oficina: +55 81 3127.0881 <*callto://+558131270881>
*Website: *inlocomedia.*com <*http://Www.inlocomedia.com/>
[imagen: *inlocomedia]
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*http%3Un%2F%2*Fwww.*inlocomedia.*com%2F&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
[imagen: *LinkedIn]
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*https%3Un%2F%2*Fwww.*linkedin.*com%2*Fcompany%2*Fin-*loco-*media&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
[imagen: *Facebook] <*https://www.facebook.com/*inlocomedia> [imagen: *Twitter]
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*https%3Un%2F%2*Ftwitter.*com%2*Finlocomedia&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
--
recibiste este mensaje porque eres *subscribed al *Google Grupos "*mongodb-grupo"
de usuario.
Para otro *MongoDB opciones de apoyo técnico, ve: *http://www.mongodb.org/sobre/apoyo/.
---
Recibiste este mensaje porque eres *subscribed al *Google Grupos "*mongodb-grupo" de usuario.
A *unsubscribe de este grupo y la parón que recibe *emails de él, enviar un *email a *mongodb-usuario+unsubscribe@xxxxxxxxxxxxxxxx.
A correo a este grupo, envía *email a *mongodb-user@xxxxxxxxxxxxxxxx.
Visita este grupo en *https://grupos.*google.*com/Grupo/*mongodb-usuario.
Para ver esta discusión en la visita de web *https://grupos.*google.*com/*d/*msgid/*mongodb-Usuario/*CACc%3*D51*bC0__*UJXEGdmF58*pHE%3*DK7*T%2*Bnhi7*N-*M9*wYupjYu6*bnz2*Q%40correo.*gmail.*com.
Para más opciones, visita *https://grupos.*google.*com/*d/*optout.
| I tested and it works. Thanks, again Luke!!
On Wed, Jan 27, 2016 at 4:18 PM, Luke Lovett <luke.lovett@xxxxxxxxx> wrote:
> I just resolved HADOOP-253; this should be fixed now in the master branch.
>
>
> On Wednesday, January 27, 2016 at 9:29:14 AM UTC-8, Luke Lovett wrote:
>>
>> I think it has to do with the code path that the compressed BSON takes.
>> The way the FileSystem is being retrieved is ignoring the scheme in the
>> URI. The fix for this issue is already in the works. By the time that 1.5
>> comes out this will no longer be a problem.
>>
>> On Wednesday, January 27, 2016 at 7:16:35 AM UTC-8, Rafael Aguiar wrote:
>>>
>>> Luke,
>>>
>>> I can read a regular BSON from S3, it's just when I try the compressed
>>> ones that I see that error.
>>>
>>> I'll try your suggestion, though;
>>>
>>> Em terça-feira, 26 de janeiro de 2016 17:02:41 UTC-3, Luke Lovett
>>> escreveu:
>>>>
>>>> It doesn't look like the problem is with the fact that the BSON is
>>>> compressed, but the fact that Hadoop has not been configured to use the s3
>>>> filesystem (it's expecting HDFS, apparently). That the connector doesn't
>>>> pick up on the fact that "s3n://" means s3 is the connector's problem (I
>>>> just filed HADOOP-253), but you can work around the problem by configuring
>>>> Hadoop to use s3 (and only s3) by setting "fs.default.name" and
>>>> "fs.defaultFS".
>>>>
>>>> On Monday, January 25, 2016 at 10:32:47 PM UTC-8, Rafael Aguiar wrote:
>>>>>
>>>>> I'm using pyspark (on spark 1.3.1) along with the mongo-hadoop jar,
>>>>> both built from the master
>>>>> branch.
>>>>>
>>>>> rdd = sc.newAPIHadoopRDD(
>>>>> inputFormatClass='com.mongodb.hadoop.BSONFileInputFormat',
>>>>> keyClass='org.apache.hadoop.io.Text',
>>>>> valueClass='org.apache.hadoop.io.MapWritable',
>>>>> conf={
>>>>> 'mapred.input.dir': 's3n://my-bucket/compressed_bson.gz'
>>>>> }
>>>>> )
>>>>>
>>>>>
>>>>> When I try to create the RDD above I get the following error:
>>>>>
>>>>> INFO hadoop.BSONFileInputFormat: File s3n://my-bucket/compressed_bson.gz
>>>>> is compressed so cannot be split.
>>>>> Traceback (most recent call last):
>>>>> File "<stdin>", line 6, in <module>
>>>>> File "/home/hadoop/spark/python/pyspark/context.py", line 547, in
>>>>> newAPIHadoopRDD
>>>>> jconf, batchSize)
>>>>> File
>>>>> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py"
>>>>> , line 538, in __call__
>>>>> File
>>>>> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>>>>> line 300, in get_return_value
>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling z:org.
>>>>> apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
>>>>> : java.lang.IllegalArgumentException: Wrong FS: s3n://my-bucket/compressed_bson.gz,
>>>>> expected: hdfs://10.0.2.139:9000
>>>>>
>>>>> Has anyone faced something similar?
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "mongodb-user" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/mongodb-user/2jcrxOdRuFo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> mongodb-user+unsubscribe@xxxxxxxxxxxxxxxx.
> To post to this group, send email to mongodb-user@xxxxxxxxxxxxxxxx.
> Visit this group at https://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/9b9e529c-9d0d-4d07-835c-1584124b80eb%40googlegroups.com
> <https://groups.google.com/d/msgid/mongodb-user/9b9e529c-9d0d-4d07-835c-1584124b80eb%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
--
Rafael AguiarData Science Engineer
Mobile: +55 81 99730.0415 <callto://+5581997300415>
Skype: rafael_aguiar_
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=callto%3A%2F%2Frafael_aguiar_&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
Office: +55 81 3127.0881 <callto://+558131270881>
Website: inlocomedia.com <http://www.inlocomedia.com/>
[image: inlocomedia]
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=http%3A%2F%2Fwww.inlocomedia.com%2F&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
[image: LinkedIn]
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fin-loco-media&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
[image: Facebook] <https://www.facebook.com/inlocomedia> [image: Twitter]
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=https%3A%2F%2Ftwitter.com%2Finlocomedia&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@xxxxxxxxxxxxxxxx.
To post to this group, send email to mongodb-user@xxxxxxxxxxxxxxxx.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CACc%3D51bC0__UJXEGdmF58pHE%3DK7T%2Bnhi7N-M9wYupjYu6bnz2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
|